Using neural networks for classification in Hierarchical data - machine-learning

I am stuck in a problem wherein I have hierarchical data, say A->B->C(smaller to biggest), and the smallest unit of data is a block(A consists of multiple blocks, B consists of multiple A's, and C consists of multiple B's), and I want to classify blocks into labels. Now the block labels for each group of A is independent of block labels for another group of A, however the "trends or patterns" followed by data could be similar and that is what is to be learnt. The complexity I am facing is variable input sizes. I cannot possibly train single neural networks for groups of A, since its a large number. So, I am thinking in terms of groups at level B, but how could I create a scheme which could handle these variable input sizes. Each block is represented by a one dimensional array of the total number of labels in the group of A it belongs to.Also, I have the information for hierarchy for every block(smallest unit) possible. Any help would be appreciated. Thanks!

Related

Regarding prediction of Decision Tree

How does Decision tree predict the out come on a new Data set. Lets say with the hyper parameters I allowed my decision tree to grow only to a certain extent to avoid over fitting. Now a new data point is passed to this trained model, so the new data point reaches to one of the leaf nodes. But how does that leaf node predict whether the data point is either 1 or 0? ( I am talking about Classification here).
Well, you pretty much answered your own question. But just to the extension, in the last the data is labelled to either 0 or 1 is hgihly dependent on the type of algorithm you used, for example, ID3 , uses the mode value to predict. similarly C4.5 and C5 or CART have there different criteria based on info gain, ginni index etc etc....
In simplified terms, the process of training a decision tree and predicting the target features of query instances is as follows:
Present a dataset containing of a number of training instances characterized by a number of descriptive features and a target feature
Train the decision tree model by continuously splitting the target feature along the values of the descriptive features using a measure of information gain during the training process
Grow the tree until we accomplish a stopping criteria --> create leaf nodes which represent the predictions we want to make for new query instances
Show query instances to the tree and run down the tree until we arrive at leaf nodes
DONE - Congratulations you have found the answers to your questions
here is a link I am suggesting which explain the decision tree in very detail with scratch. Give it a good read -
https://www.python-course.eu/Decision_Trees.php

Is it possible to cluster data with grouped rows of data in unsupervised learning?

I am working to setup data for an unsupervised learning algorithm. The goal of the project is to group (cluster) different customers together based on their behavior on the website. Obviously, some sort of clustering algorithm is best for discovering patterns in the data we can't see as humans.
However, the database contains multiple rows for each customer (in chronological order) for each action the customer took on the website for that visit. For example customer with ID# 123 clicked on page 1 at time X and that would be a row in the database, and then the same customer clicked another page at time Y. That would make another row in the database.
My question is what algorithm or approach would you use for clustering in this given scenario? K-means is really popular for this type of problem, but I don't know if it's possible to use in this situation because of the grouping. Is it somehow possible to do cluster analysis around one specific ID that includes multiple rows?
Any help/direction of unsupervised learning I should take is appreciated.
In short,
Learn a fixed-length embedding (representation) of each event;
Learn a way to combine a sequence of such embeddings into a single representation for each event, then use your favorite unsupervised methods.
For (1), you can do it either manually or use an encoder/decoder;
For (2), there is a range of things you can do, ranging from just simply averaging embeddings from each event, to training an encoder-decoder on reconstructing the original sequence of events and take the intermediate representation (that the decoder uses to reconstruct the original sequence).
A good read on this topic (though a bit old; you now also have the option of Transformer Network):
Representations for Language: From Word Embeddings to Sentence Meanings

Classify visually distinct objects as one class

We are building a neural network to classify objects and have a large dataset of images for 1000 classes. One of the classes is “banana” and it contains 1000 images of banana. Some of those images (about 10%) are of mashed bananas, which are visually very different from the rest of the images in that class.
If we want both mashed bananas and regular bananas to be classified, should we split the banana images into two separate classes and train separately, or keep the two subsets merged?
I am trying to understand how the presence of a visually distinct subclass impacts the recognition of a given class.
The problem here is simple. You need your neural network to learn both groups of images. That means you need to back-propagate sensible error information. If you do have the ground truth information about mashed bananas, back-propagating that is definitely useful. It helps the first layers learn two sets of features.
Note that the nice thing about neural networks is that you can back-propagate any kind of error vector. If your output has 3 nodes banana, non-mashed banana, mashed banana, you basically sidestep the binary choice implied in your question. You can always drop output nodes during inference.
There is no standard answer to be given here; it might be very hard for your network to generalize over classes if their subclasses are distinct in the feature space, in which case introducing multiple dummy classes that you collapse into a single one via post-processing would be the ideal solution. You could also pretrain a model with distinct classes (so as to build representations that discriminate between them), and then pop the final network layer (the classifier) and replace it with a collapsed classifier, fitting with the initial labels. This would accomplish having discriminating representations which are simply classified commonly. In any case I would advise you to construct the subclass-specific labels and check per-subclass error while training with the original classes; this way you will be able to quantify the prediction error you get and avoid over-engineering your network in case it can learn the task by itself without stricter supervision.

Decision Tree for Choosing Most Likely Option?

I'm trying to find the right ML algorithm. Let's say I have three data columns. I have a binary outcome for each column (either the data column belongs to (Group A) classification or it does not), BUT in each set of three data columns that I feed in, exactly ONE and only one column belongs to Group A.
Which algorithm can I choose to select the ONE BEST result of the three each time? Can I do this with a decision tree?
Decision tree aka ID3, can be suitable for this simple problem... best way is to check it on the data and see it's output prediction
ID3 have a problem of over fitting though
basically every classifier can do a good job on this task, if it linearly separable even SVM can be a good choice, also I'm suggesting trying basic neural network with 1/2 nodes at the output layer for classification of 2 groups
all of them are implemented via various packages and are fairly easy to use (almost any coding language)

objects classification, mutually related features

Looking for some inspirations on how to address the following problem:
there is a collection of multiple worlds,
each world has a collection of objects,
a single object, or a group of objects, may have a maximum of one category assigned,
some categories are mutually related - i.e., the fact that object1 in group1 belongs to categoryA, increases a chance that some other group containing the same object1 belongs to categoryB
Having a dataset with multiple worlds fully described - the target is to take a completely new world and correctly categorize the objects and groups.
I would appreciate some ideas on how to address it.
My approach was to write classifiers that learn different characteristics of objects and groups based on the learning data, and then assign scores (a number between 0 and 1) to different combinations of objects in the unknown world. The problem I'm facing though is how to provide the final response. With like 20 classifiers and each assigning scores to multiple groups, it's difficult to say. For example, sometimes multiple classifiers return scores with very small values, that sum up to a big number, and that shades the fact that one very rare classifier returned 1.

Resources