Graph as output of a Classifier - machine-learning

It seems that I have a very common task, but I'm missing some keywords that would help me to find the information. So I state my task.
There are Persons. A set of variables is known about each person. A pair of persons P1 and P2 can be in one of the following relationships (which are the classes) :
Parent-child
siblings
partners (the significant ones)
other (some indirect relative or non a family member)
By selecting some variables of the pairs (Pi, Pk) with known relationships, I can train a Naive Bayes Classifier to predict the class. This is good.
Now. I have a set of persons P1, P2, ... Pm, and I need to build the most probable graph representing the family tree. I could use my Bayes Classifier pairwise, but in this case I wouldn't use a lot of information that is stored in the graph / in the combinations of several nodes.
For example, nodes P1, P2, P3 and P4 are given. My Bayes Classifier thinks with a good probability of 0.9 that P2 is parent of P1, and P4 is parent of P3. As of the relationship between P1 and P3, it returns p=0.31 for siblings and p=0.34 for partners, so the result is quite unreliable. Now, if the classification of the relationship between P2 and P4 yields "partner" with a good probability of say 0.7, I could be more sure that P1 and P3 are in fact siblings. On the other hand, if P2 and P4 are "other" with probability of 0.8, it is safer for me to conclude that P1 and P3 are partners.
I could code this logic by hand, but I think there are a lot more cases and logical dependencies, especially if we want to build a relationship graph for around 10 or 20 persons. Therefore I would like to use some kind of a classifier or classifier system.
But the output of this classifier system will be not a binary or scalar value, but a whole graph. What can I use or where can I start looking?
Thanks!

You want some to do some kind of structure learning. Just like graphs are much more complicated than bits, structure learning is much more complicated than classification.
You probably want to find a maximum a posteriori (MAP) family tree, subject your probabilistic knowledge of the individual relationships. The MAP is the single most likely assignment given all of your knowledge. The general problem of figuring out relationships between probabalistically related items is called probabalistic inference, or sometimes just inference.
I don't know if you can access the course materials at the recently completed probabilistic graphical models class, but that would be well studying.

Related

A challenge in the data cleaning process an ambiguous relationship between the features of the data - supervised learning

I am conducting a supervised learning analysis to investigate the relationship between water salinity and water temperature. My goal is to determine if water temperature can be predicted based on salinity levels.
The dataset I am using is the CalCOFI dataset, which contains oceanographic and larval fish data from around the world. The dataset can be found on Kaggle at this link: https://www.kaggle.com/datasets/sohier/calcofi.
However, I am facing a challenge in the analysis due to the unclear relationship between the features of the data. This ambiguity has left me unsure about how to approach the problem and which model to choose.
I am reaching out to the community for their opinions and to learn as much as I can from this experience. I would appreciate any insights or suggestions you may have.
As depicted in the graph above, the data does not exhibit a clear relationship, This unclear relationship suggest additional factors to establish a clearer relationship. Further investigation reveals that water temperature is also influenced by water depth.
(https://oceanservice.noaa.gov/facts/coldocean.html)
red line is the average depth of the data (266.8 m).
As depicted in the graph above, the relationship between water temperature is also influenced by water depth with clear polynomial behavior.
However, we can see 2 "fangs" the smaller one is probably offseted data from the actual measurement environment etc..
As a result, I tried to eliminate all values above the average water depth in order to clean the data and make it more reliable but it didn't work :(
I have considered two ways to address the issue. The first approach is to acknowledge that there is a clear relationship between water depth and temperature. We can express this relationship as follows:
f(water_depth) = water_temperature.
Additionally, we can also express the relationship between water depth and salinity as
g(water_depth) = water_salinity.
By combining these two relationships, we can derive the relationship between water salinity and water temperature: f(g^-1(water_salinity)) = water_temperature.
With this information, I plan to create two separate models: one for the relationship between depth and temperature, and one for the relationship between depth and salinity. Then, I will use the mathematical relationship derived above to establish the relationship between water salinity and water temperature.
However, I have concerns that this approach may be too complex and inelegant. Additionally, I would expect a mathematical relationship based on polynomial equations to have a different appearance than the graphs I have seen. I may be mistaken, but this leads me to question the validity of this approach.
The alternative method I have considered is to disregard the relationship between the features and simply use the first 1000 cases to test various models and evaluate their error scores. While this method may yield results, it goes against my desire to have a deeper understanding of the relationship between water salinity and water temperature.

Regarding prediction of Decision Tree

How does Decision tree predict the out come on a new Data set. Lets say with the hyper parameters I allowed my decision tree to grow only to a certain extent to avoid over fitting. Now a new data point is passed to this trained model, so the new data point reaches to one of the leaf nodes. But how does that leaf node predict whether the data point is either 1 or 0? ( I am talking about Classification here).
Well, you pretty much answered your own question. But just to the extension, in the last the data is labelled to either 0 or 1 is hgihly dependent on the type of algorithm you used, for example, ID3 , uses the mode value to predict. similarly C4.5 and C5 or CART have there different criteria based on info gain, ginni index etc etc....
In simplified terms, the process of training a decision tree and predicting the target features of query instances is as follows:
Present a dataset containing of a number of training instances characterized by a number of descriptive features and a target feature
Train the decision tree model by continuously splitting the target feature along the values of the descriptive features using a measure of information gain during the training process
Grow the tree until we accomplish a stopping criteria --> create leaf nodes which represent the predictions we want to make for new query instances
Show query instances to the tree and run down the tree until we arrive at leaf nodes
DONE - Congratulations you have found the answers to your questions
here is a link I am suggesting which explain the decision tree in very detail with scratch. Give it a good read -
https://www.python-course.eu/Decision_Trees.php

Can decision tree based model predict future?

I am trying to build a model that predicts the shipping volume of each month, week, and day.
I found that the decision tree-based model works better than linear regression.
But I read some articles about machine learning and it says decision tree based model can't predict future which model didn't learn. (extrapolation issues)
So I think it means that if the data is spread between the dates that train data has, the model can predcit well, but if the date of data is out of the range, it can not.
I'd like to confirm if my understand is correct.
some posting shows prediction for datetime based data using random forest model, and it makes me confused.
Also please let me know if there is any way to overcome extrapolation issues on decision tree based model.
It depends on the data.
Decision tree predicts class value of any sample in range of [minimum of class value of training data, maximum of class value of training data]. For example, let there are five samples [(X1, Y1), (X2, Y2), ..., (X5, Y5)], and well trained tree has two decision node. The first node N1 includes (X1, Y1), (X2, Y2) and the other node N2 includes (X3, Y3), (X4, Y4), and (X5, Y5). Then the tree will predict a new sample as mean of Y1 and Y2 when the sample reaches N1, but it will predict a new sample as men of Y3, Y4, Y5 when the sample reaches N2.
With this reason, if the class value of new sample could be bigger than the maximum of class value of training data or could be smaller than the minimum of class value of training data, it is not recommend to use decision tree. Otherwise, tree-based model such as random forest shows good performance.
There can be different forms of extrapolation issues here.
As already mentioned a classical decision tree for classification can only predict values it has encountered in its training/creation process. In that sense you won't predict any previously unseen values.
This issue can be remedied if you have the classifier predict relative updates instead of absolute values. But you need to have some understanding of your data, to determine what works best for different cases.
Things are similar for a decision tree used for regression.
The next issue with "extrapolation" is that decision trees might perform badly if your training data has changing statistics over time. Again, I would propose to predict update relationships.
Otherwise, predictions based on training data from a more recent past might yield better predictions. Since individual decision trees can't be trained in an online manner, you would have to create a new decision tree every x time steps.
Going further than this I'd say you'll want to start thinking in state machines and trying to use your classifier for state predictions. But this a fairly uncharted domain of theory for decision trees from when I last checked. This will work better if you already have some for of model for your data relationships in mind.

How to find probability of predicted weight of a link in weighted graph

I have an undirected weighted graph. Let's say node A and node B don't have a direct link between them but there are paths connects both nodes through other intermediate nodes. Now I want to predict the possible weight of the direct link between the node A and B as well as the probability of it.
I can predict the weight by finding the possible paths and their average weight but how can I find the probability of it
The problem you are describing is called link prediction. Here is a short tutorial explaining about the problem and some simple heuristics that can be used to solve it.
Since this is an open-ended problem, these simple solutions can be improved a lot by using more complicated techniques. Another approach for predicting the probability for an edge is to use Machine Learning rather than rule-based heuristics.
A recent article called node2vec, proposed an algorithm that maps each node in a graph to a dense vector (aka embedding). Then, by applying some binary operator on a pair of nodes, we get an edge representation (another vector). This vector is then used as input features to some classifier that predicts the edge-probability. The paper compared a few such binary operators over a few different datasets, and significantly outperformed the heuristic benchmark scores across all of these datasets.
The code to compute embeddings given your graph can be found here.

Convert Decision Table To Decision Tree

How to convert or visualize decision table to decision tree graph,
is there an algorithm to solve it, or a software to visualize it?
For example, I want to visualize my decision table below:
http://i.stack.imgur.com/Qe2Pw.jpg
Gotta say that is an interesting question.
I don't know the definitive answer, but I'd propose such a method:
use Karnaugh map to turn your decision table to minimized boolean function
turn your function into a tree
Lets simplyify an example, and assume that using Karnaugh got you function (a and b) or c or d. You can turn that into a tree as:
Source: my own
It certainly is easier to generate a decision table from a decision tree, not the other way around.
But the way I see it you could convert your decision table to a data set. Let the 'Disease' be the class attribute and treat the evidence as simple binary instance attributes. From that you can easily generate a decision tree using one of available decision tree induction algorithms, for example C4.5. Just remember to disable pruning and lower the minimum number of objects parameter.
During that process you would lose a bit of information, but the accuracy would remain the same. Take a look at both rows describing disease D04 - the second row is in fact more general than the first. Decision tree generated from this data would recognize the mentioned disease only from E11, 12 and 13 attributes, since it's enough to correctly label the instance.
I've spent few hours looking for a good algorithm. But I'm happy with my results.
My code is too dirty now to paste here (I can share privately on request, on your discretion) but the general idea is as the following.
Assume you have a data set with some decision criteria and outcome.
Define a tree structure (e.g. data.tree in R) and create "Start" root node.
Calculate outcome entropy of your data set. If entropy is zero you are done.
Using each criterion, one by one, as tree node calculate entropy for all branches created with this criterion. Take the minimum one entropy of all branches.
Branches created with the criterion with the smallest (minimum) entropy are your next tree node. Add them as child nodes.
Split your data according to decision point/tree node found in step 4 and remove the criterion used.
Repeat step 2-4 for each branch until your all branches have entropy = 0.
Enjoy your ideal decision tree :)

Resources