I'm doing for the first time an ontology in Protege, but I have never worked with it.
I have a manufacturing process, where I have two robots, a machine tool, two storages (S1 and S2), a working table, a computer vision system, a conveyor and 6 types of pieces (A, B, C, D, E, F). I have some goals set (ex: Storage S2 must have a piece of type A in position (row, column) (1,4) with orientation orientation1. I though to create a class for Robot which will have the following properties: hasState (the robot can be free or can have a piece), hasPosition (the robot can be in four predefined positions) and hasPiece.
The question is the following: when I will create the individuals for the two robots, what I will set in the hasPiece properties? I need to create the ontology in Protege and after that, to create a CLIPS program that will resolve the problem(will move the pieces from the storage S1 in storage S2 in the desired positions). Will the individuals be the initial facts? I only saw examples of ontologies for pizza and countries and these didn't have properties that will be modified during CLIPS program running.
Will the individuals be the initial facts?
I would assume so from your description.
Individuals and properties are created the same way no matter how they will be subsequently modified. I would assume that all you need to change from the pizza example is the name of properties, classes and individuals required.
Related
I am working on a use case predict relation between nodes but of different type.
I have a graph something like this.
(:customer)-[:has]->(:session) (:session)-[:contains]->(:order) (:order)-[:has]->(:product) (:order)-[:to]->(:relation)
There are many customers who have placed orders. Some of the orders specify to whom the order was intended to (relation) i.e., mother/father etc. and some orders do not. For these orders my intention is to predict to whom the order was likely intended to.
I have prepared a Link Prediction ML pipeline on neo4j. The gds.beta.pipeline.linkPrediction.predict.mutate procedure has 2 ways of prediction: Exhaustive search, Approximate search. The first one predicts for all unconnected nodes and the second one applies KNN to predict. I do not want both; rather I want the model to predict the link only between 2 specific nodes 'order' node and 'relation' node. How do I specify this in the predict procedure?
You can also frame this problem as node classification and get what you are looking for. Treat Relation as the target variable and it will become a multi class classification problem. Let's say that Relation is a categorical variable with a few types (Mother/Father/Sibling/Friend etc.) and the hypothesis is that based on the properties on the Customer and the Order nodes, we can predict which relation a certain order is intended to.
Some of the examples of the properties of Customer nodes are age, location, billing address etc., and the properties of the Order nodes are category, description, shipped address, billing address, location etc. Properties of Session nodes are probably not useful in predicting anything about the order or the relation that order is intended to.
For running any algorithm in Neo4j, we have to project a graph into memory. Some of the properties on Customer and Order nodes are strings and graph projection procs do not support projecting strings into memory. Hence, the strings have to be converted into numerical values.
For example, Customer age can be used as is but the order description has to be converted into a word/phrase embedding using some NLP methodology etc. Some creative feature engineering also helps - instead of encoding billing/shipping addresses, a simple flag to identify if they are the same or different makes it easier to differentiate if the customer is shipping the order to his/her own address or to somewhere else.
Since we are using Relation as a target variable, let's label encode the relation type and add that as a class label property on Order nodes where relationship to Relation node exists (labelled examples). For all other orders, add a class label property as 0 (or any other number other than the label encoded relation type)
Now, project a graph with Customer, Session and Order nodes along with the properties of interest into memory. Since we are not using Session nodes in our prediction task, we can collapse the path between Customer and Order nodes. One customer can connect to multiple orders via multiple session nodes and orders are unique. Collapse path procedure will not result in multiple relationships between a customer and an order node and hence, aggregation is not needed.
You can now use Node classification ML pipeline in Neo4j GDS library to generate embeddings and use embedding property on Order node as a feature vector and class label property as target and train a multi class classification model to predict the class that particular order belongs to or the likelihood that particular order is intended to some relation type.
This use case is not supported by the latest stable release of GDS (2.1.11, at the time of writing). In GDS pipelines, we assume a homogeneous graph, where the training algorithm will consider each node as the same type as any other node, and similarly for relationships.
However, we are currently building features to support heterogeneous use cases. In 2.2 we will add so-called context configuration, where you can direct your training algorithm to attempt to learn only a specific relationship type between specific source and target node labels, while still allowing the feature-producing node property steps to use the richer graph.
This will be effective relative to the node features you are using -- if you are using an embedding, you must know that these are still homogeneous and will not likely be able to tell the various different relationship types apart (except for GraphSAGE). Even if you do use them, you will only get the predictions for the relevant label-type-label triple which you specified for training. But I would recommend to think about what features to use and how to tune your models effectively.
You can already try out the 2.2 features by using our alpha releases -- find our latest alpha through this download link. Preview documentation is available here. Note that this is preview software and that the API may change a lot until the final 2.2.0 released version.
I have a dataset that contains many features. I have one features that contain a list of values in one data point. It's can be like this :
A B C
1 2 [3,4,5]
So what can we handle features C for recommendation system?. I have known about one hot encoding but my features C doesn't have finite values. C contain ID number of others therefore it can become larger and larger overtime. Is there any solution to deal with this type of features?
From what you described and since you mentioned about recommendation system, I would consider your data set as an example of following:
per row is a user and feature A, B are user personal information for instance and feature C is the items he bought. And of course, feature C doesn't contain the same numbers of items in each row and it can expand.
I would build two different recommendation models and combine them together afterward. One for feature A, B and another is for feature C.
Since feature C evolves with time, you can build the model on regular time base (take the snapshot of the feature C) or as long as some 'event" triggers the building process. For feature C, in my example, is user-item matrices.
I am developing a Neo4j database that will contain genomic and clinical data for cancer patients. A common design issue in developing graph databases is whether a data item should be represented by a Node or by a property within a Node. In my case, patients will have hundreds of clinical and demographic measurements (e.g. sex, medications, tumor size). Some of these data will be constant (e.g. sex) while others will be subject to variation with each patient visit. The responses I've seen to previous node vs property questions have recommended using the anticipated queries against the data to make the decision. I think I can identify some properties that will be common search criteria and should be nodes (e.g. smoking history, sex, cancer type) but that still leaves me with hundreds of other properties. Is there a practical limit in Neo4j for the number of properties that a Node should contain? Also, a hybrid approach, where some data are properties and others are Nodes would seem to make both loading data from source files and subsequent queries more complicated.
The main idea behind "look at your queries to decide", is that how data relates to each other effects whether a node or property is better. Really, the main point of a graph database is to make walking relationships easier to query. So the real question you should ask yourself is "Does (a)-->()<--(b) have any significant meaning?" In other words, do I need to be able to find other nodes that share this property?
Here are some quick rule-of-thumb guidlines
Node
Has it's own sub-values or relations
Multiple nodes sharing this value has meaning, and you need to be able to walk along this shared value between them
Changes infrequently
If more than 1 value can apply at the same time
Properties
Has a large range of possible values
Changes over time
If more than 1 value can apply, values are usually updated as a set (rather than individually)
Label
Has a small, finite range of mutually exclusive values
Almost never changes
So lets go through the thought process of a few of your properties...
Sex
Will either be "Male" or "Female", and everyone will be connected to one of the two, so they will both end up being super nodes (overloaded). Also, if you do end up needing to find two people that share the same sex, almost any other method would be more efficient than finding them through the super node. However these are mutually exclusive, immutable, genetic traits so making this a label is also perfectly acceptable (and sometimes preferred).
Address
This is a variable value with sub-properties, won't be shared by very many nodes, and the walk from one person to another at the same address (or, by extension, live in an area) has valuable meaning. So this should almost definitely be a node.
Height and Weight
These change constantly with time, have no sub values, and two people sharing this value has little to no meaning. The range of values is far too wide, so Labels make no since either, so this should be a property.
Blood type
While has more options then Sex does, all the same logic applies, except that the relation does matter now (because people must share a blood type to donate). The problem is that this value will be so overloaded, that you will need to filter on area first, and than just verifying blood type. Could be a property or label. The case for node is if you include an "Can_Donate_To" or "Can_Accept" relation between the blood types. While you likely won't walk these relations to find a potential donor (because they are too overloaded, and you will have to filter by area first), you can use them to verify someone can be a donor.
Social Security Number
Is highly sensitive, and a lawsuit waiting to happen. Keep out of the DB if at all possible. If you absolutely have to; this property is immutable, but will be unique to every person, so because of the lack of reuse, is a bad label and will be pointless as a node. Definitely a property. (But should be salted+hashed if only for verification purposes only)
Mother's maiden name
The possible values are endless, and two nodes sharing this value has no real meaning. Definitely a property.
First born child
Since the child is already their own node, with it's own sub properties, just create a relation between the two. While the value of this info is questionable, any time you need to reference another node, always use a relationship for it. Definitely a node.
I am teaching myself graph modelling and use Neo4j 2.2.3 database with NodeJs and Express framework.
I have skimmed through the free neo4j graph database book and learned how to model a scenario, when to use relationship and when to create nodes, etc.
I have modelled a vehicle selling scenario, with following structure
NODES
(:VEHICLE{mileage:xxx, manufacture_year: xxxx, price: xxxx})
(:VFUEL_TYPE{type:xxxx}) x 2 (one for diesel and one for petrol)
(:VCOLOR{color:xxxx}) x 8 (red, green, blue, .... yellow)
(:VGEARBOX{type:xxx}) x 2 (AUTO, MANUAL)
RELATIONSHIPS
(vehicleNode)-[:VHAVE_COLOR]->(colorNode - either of the colors)
(vehicleNode)-[:VGEARBOX_IS]->(gearboxNode - either manual or auto)
(vehicleNode)-[:VCONSUMES_FUEL_TYPE]->(fuelNode - either diesel or petrol)
Assuming we have the above structure and so on for the rest of the features.
As shown in the above screenshot (136 & 137 are VEHICLE nodes), majority of the features of a vehicle is created as separate nodes and shared among vehicles with common feature with relationships.
Could you please advise whether roles (labels) like color, body type, driving side (left drive or right drive), gearbox and others should be seperate nodes or properties of vehicle node? Which option is more performance friendly, and easy to query?
I want to write a JS code that allows querying the graph with above structure with one or many search criteria. If majority of those features are properties of VEHICLE node then querying would not be difficult:
MATCH (v:VEHICLE) WHERE v.gearbox = "MANUAL" AND v.fuel_type = "PETROL" AND v.price > x AND v.price < y AND .... RETURN v;
However with existing graph model that I have it is tricky to search, specially when there are multiple criteria that are not necessarily a properties of VEHICLE node but separate nodes and linked via relationship.
Any ideas and advise in regards to existing structure of the graph to make it more query-able as well as performance friendly would be much appreciated. If we imagine a scenario with 1000 VEHICLE nodes that would generate 15000 relationship, sounds a bit scary and if it hits a million VEHICLE then at most 15 million relationships. Please comment if I am heading in the wrong direction.
Thank you for your time.
Modeling is full of tradeoffs, it looks like you have a decent start.
Don't be concerned at all with the number of relationships. That's what graph databases are good at, so I wouldn't be too concerned about over-using them.
Should something be a property, or a node? I can't answer for your scenario, but here are some things to consider:
If you look something up by a value all the time, and you have many objects, it's usually going to be faster to find one node and then everything connected to it, because graph DBs are good at exploiting relationships. It's less fast to scan all nodes of a label and find the items where a property=a value.
Relationships work well when you want to express a connection to something that isn't a simple primitive data type. For example, take "gearbox". There's manuals, and other types...if it's a property value, you won't later easily be able to decide to store 4 other sub-types/sub-aspects of "gearbox". If it were a node, that would later be easy because you could add more properties to the node, or relate other things.
If a piece of data really is a primitive (String, integer, etc) and you don't need extra detail about it, that usually makes a good property. Querying primitive values by connecting to other nodes will seem clunky later on. For example, I wouldn't model a person with a "date of birth" as a separate node, that would be irritating to query, and would give you flexibility you'd be very unlikely to need in the future.
Semantically, how is your data related? If two items are similar because they share an X, then that X probably should be a node. If two items happen to have the same Y value but that doesn't really mean much, then Y is probably better off as a node property.
I am new to Neo4j and I need some advice from the more experienced Neo4j developers.
In which situation does it makes sense for an inventory system to represent individual items as a path through their properties instead of a node with the same properties?
In order to make my self clear:
Let's say we have a eyeglass lens. This item has properties like it's SPHERE power it's CYLINDER power and an AXIS, among others.
There is a finite set of SPHERE powers but also of CYLINDER power and AXIS. The combination of those makes an item (lens).
Does it make sense to represent a lens like this:
MATCH (lens:Lens)-[:-2.00]-(sph:Sphere:{power:'-2.00'})-[:-0.50]-(cyl:Cylinder{power:'-0.50'})-[:90]-(ax:Axis{degree:'90'})
RETURN lens.brand_name, lens.price
Please note that the above item(lens) can be available from different manufacturers and with different brand names and list prices so "lens" will represent all individual brands that can match with the above query and will have as properties the brand name and price, at least.
Let's say you have a piece of data ("SPHERE"). When should it be a property of the lens node, and when should it be its own node, via relation?
Do you need to relate multiple lenses to the same sphere? This argues it should be its own node, so that multiple lenses can link to the same sphere.
Do you need to assert extra properties about the sphere value? (Like who measured it, or when?) This argues you should make it a separate node.
Do you need to store properties about the relationship? If the relationship is any more complicated than simple "HAS A" you might want a relationship between two nodes, so you can store properties on the relationship.
Any of those cases would argue you should store that piece of data as a separate node, and then relate it by relationship.
ON THE OTHER HAND, if it's a simple primitive data type (float), with a simple "HAS-A" relationship to the parent (i.e. a lens HAS-A sphere measurement) and you have no need for extra metadata, then it should be a node property.
I'm not an optometrist but I think this latter situation is your case, I'm just trying to give you a more general answer. "Sphere" should probably be a node property, but the cases above are how to think about the issue more generally for future data items.
In your special domain, with finite ranges and discrete values for each of the parameters, it absolutely makes sense to model the properties of a lens as value nodes. The resulting index graph seems not to be too large, and quite balanced (no supernodes).