As I am still learning, I am implementing a neural network for a two-class problem. Let’s call the classes class A and class B. Before training, the first thing I do is to collect data from both classes, and then shuffle it. Now, I am aware of how to split data into multiple sets (training, validation, and test).
My question is about what the data actually should contain. If we seek to find out that new and unseen data belong to class A, and if we view the data that belongs to class A as a positive class, and if we view all of the rest data as a negative, should the negative data really contain a collection of different classes such as B, C, D, etc.?
If we talk percentages, then my question is if, say, 50% of the data should belong to class A (positive), while the remaining 50% should be e.g. 33% B, 33% C, and 33%D (which all are negative)?
So basically I have multiple classes, A, B, C, D, and so on. But I only seek to find out if something is class A or not. Is this why it is still considered a two-class problem (positive/negative) even though I actually use several classes? As you probably understand, I'm quite confused.
Related
I’ve questioned the way famous Cleveland heart disease dataset labels its objects here
This dataset is very unbalanced (many objects of “no disease” class). I noticed that many papers that used this dataset used to combine all the other classes and reduce this to a binary classification (disease vs no disease)
Are there other ways to deal with this unbalancing class problem rather than reduce the number of classes to get a good result from a classifer?
Generally speaking, when handling a non balanced dataset, one should use a non-supervised learning approach.
You may use the Multivariate Normal Distribution.
In your case, if you have many elements in one class and very few in the other class, a supervised learning method is not appropriate. Therefore, the Multivariate Normal Distribution, which is a non supervised machine learning approach, may be the solution. The algorithm learns from the data and finds values which define the data (i.e. the most important part of the data, here the "no desease" cases). Once these values are outputed, one can search the elements which do not fit them, and these elements are the so called "abnormal elements" or "anomalies". In your case, these are the "disease" individuals.
A second solution would be to ballance you dataset, and use the initial supervised learning algorithm. You can do that using the following techniques. These statements are generally good, but they depend a lot on the data you have (mind, I do not have access to your input data!), so you should test them and see which one best fits your purpose.
Collecting more elements for the class with few elements.
Duplicate the elements in the class with less elements, in order to obtain the same amount of data for both classes, as for the class with more lements. There is a problem with this solution, in the case where you have a great difference of input data volume between the two classes, and you use a neural network, because the class with duplicated elements will not be very variate, and neural networks provide good results only when trained with a great amount of very variate data.
Use less data in the class with more lements, in order to have the same amount of elements in both classes as in the class with few elements. Here too there might be a problem when using a neural network, because training it with less data might not give the good results. be careful also in order to have more input elements than features, otherwise it would not work.
We are building a neural network to classify objects and have a large dataset of images for 1000 classes. One of the classes is “banana” and it contains 1000 images of banana. Some of those images (about 10%) are of mashed bananas, which are visually very different from the rest of the images in that class.
If we want both mashed bananas and regular bananas to be classified, should we split the banana images into two separate classes and train separately, or keep the two subsets merged?
I am trying to understand how the presence of a visually distinct subclass impacts the recognition of a given class.
The problem here is simple. You need your neural network to learn both groups of images. That means you need to back-propagate sensible error information. If you do have the ground truth information about mashed bananas, back-propagating that is definitely useful. It helps the first layers learn two sets of features.
Note that the nice thing about neural networks is that you can back-propagate any kind of error vector. If your output has 3 nodes banana, non-mashed banana, mashed banana, you basically sidestep the binary choice implied in your question. You can always drop output nodes during inference.
There is no standard answer to be given here; it might be very hard for your network to generalize over classes if their subclasses are distinct in the feature space, in which case introducing multiple dummy classes that you collapse into a single one via post-processing would be the ideal solution. You could also pretrain a model with distinct classes (so as to build representations that discriminate between them), and then pop the final network layer (the classifier) and replace it with a collapsed classifier, fitting with the initial labels. This would accomplish having discriminating representations which are simply classified commonly. In any case I would advise you to construct the subclass-specific labels and check per-subclass error while training with the original classes; this way you will be able to quantify the prediction error you get and avoid over-engineering your network in case it can learn the task by itself without stricter supervision.
I am stuck in a problem wherein I have hierarchical data, say A->B->C(smaller to biggest), and the smallest unit of data is a block(A consists of multiple blocks, B consists of multiple A's, and C consists of multiple B's), and I want to classify blocks into labels. Now the block labels for each group of A is independent of block labels for another group of A, however the "trends or patterns" followed by data could be similar and that is what is to be learnt. The complexity I am facing is variable input sizes. I cannot possibly train single neural networks for groups of A, since its a large number. So, I am thinking in terms of groups at level B, but how could I create a scheme which could handle these variable input sizes. Each block is represented by a one dimensional array of the total number of labels in the group of A it belongs to.Also, I have the information for hierarchy for every block(smallest unit) possible. Any help would be appreciated. Thanks!
Please bear with, I am still quite new to deep learning so what I am looking for may or may not make a lot of sense. But this is why I ask, I need some guidance on where to find a guiding example or paper.
Here is what I want and have:
I am interested in sequence data (time-series actually) hence I am using RNN/LSTM/GRU and those types of things
Now suppose I have a 1-D time-series, lets call it X = [x_1,...,x_n]
For my particular problem, it turns out that X is a function of another function, a generator function if you will s.t. X = f(a,b)
That function takes two integer parameters a and b
Here is my problem: I want to find the value of a and b that best reconstructs my time-series X (assume that I can generate time-series with f(a,b))
This leads me to believe that I must feature the actual values of the network output, i.e. a and b in my objective function
My objective function could be something like objectiveFunction(X_true,X_pred) but then my X_pred is generated from f with parameters a and b
Further, the batch size may need to be the whole time-series (they are small, and I have many examples) but we can use big mini batches if needs be.
Suppose the search space over a and b is [0,10] for both (again a and b can only assume integer values). Then I have 100 pairs e.g. (6,7) for values of a and b. As I train the network I expect the weight of edge leading from the Dense to the categorical outputs (i.e. my parameters pairs (a,b)), to get maximised as the network finds better and better time-series generator parameters a and b.
Initially, I was just going to test a network structure as such:
Input -->> RNN -->> Dense -->> Categorical output
I want to keep it simple for now and not try anything fancier such as an LSTM as my time-series only have short-term dependencies.
Hence, any advice would be most welcome. Thanks in advance.
Looking for some inspirations on how to address the following problem:
there is a collection of multiple worlds,
each world has a collection of objects,
a single object, or a group of objects, may have a maximum of one category assigned,
some categories are mutually related - i.e., the fact that object1 in group1 belongs to categoryA, increases a chance that some other group containing the same object1 belongs to categoryB
Having a dataset with multiple worlds fully described - the target is to take a completely new world and correctly categorize the objects and groups.
I would appreciate some ideas on how to address it.
My approach was to write classifiers that learn different characteristics of objects and groups based on the learning data, and then assign scores (a number between 0 and 1) to different combinations of objects in the unknown world. The problem I'm facing though is how to provide the final response. With like 20 classifiers and each assigning scores to multiple groups, it's difficult to say. For example, sometimes multiple classifiers return scores with very small values, that sum up to a big number, and that shades the fact that one very rare classifier returned 1.