which ML algorithm to choose - machine-learning

In my lab, I have 10 devices which I monitor using each device specific features like.
heat-generated
power consumed
patterns in power consumption
Using a supervised classification model I could classify these devices.
The problem I have is.. in case we add more such different type of devices.. how do I classify them? These device based on the trained model will classify new devices also as one among the classified device, which is untrue. They might have their own patterns.
Is there a way?. and how ?.

If you look at it, it seems like when a new type of device is added to your data-set, you are actually adding a new "Class".
In that case, you might have to retrain your model to accommodate the new Classes added to your dataset.

Related

Algorithm to classify instances from a dataset similar to another smaller dataset, where this smaller dataset represents a single class

I have a dataset that represents instances from a binary class. The twist here is that there are only instances from the positive class and I have none of the negative one. Or rather, I want to extract those from the negatives which are closer to the positives.
To get more concrete let's say we have data of people who bought from our store and asked for a loyalty card at the moment or later of their own volition. Privacy concerns aside (it's just an example) we have different attributes like age, postcode, etc.
The other set of clients, following with our example, are clientes that did not apply for the card.
What we want is to find a subset of those that are most similar to the ones that applied for the loyalty card in the first group, so that we can send them an offer to apply for the loyalty program.
It's not exactly a classification problem because we are trying to get instances from within the group of "negatives".
It's not exactly clustering, which is typically unsupervised, because we already know a cluster (the loyalty card clients).
I thought about using kNN. But I don't really know what are my options here.
I would also like to know how, if possible, can this be achieved with weka or another Java library and if I should normalize all the attributes.
You could use anomaly detection algorithms. These algorithms tell you whether your new client belongs to the group of clients who got a loyalty card or not (in which case they would be an anomaly).
There are two basic ideas (coming from the article I linked below):
You transform the feature vectors of your positive labelled data (clients with card) to a vector space with a lower dimensionality (e.g. by using PCA). Then you can calculate the probability distribution for the resulting transformed data and find out whether a new client belongs to the same statistical distribution or not. You can also compute the distance of a new client to the centroid of the transformed data and decide by using the standard deviation of the distribution whether it is still close enough.
The Machine Learning Approach: You train an auto-encoder network on the clients with card data. An auto-encoder has a bottleneck in its architecture. It compresses the input data into a new feature vector with a lower dimensionality and tries afterwards to reconstruct the input data from that compressed vector. If the training is done correctly, the reconstruction error for input data similar to the clients with card dataset should be smaller than for input data which is not similar to it (hopefully these are clients who do not want a card).
Have a look at this tutorial for a start: https://towardsdatascience.com/how-to-use-machine-learning-for-anomaly-detection-and-condition-monitoring-6742f82900d7
Both methods would require to standardize the attributes first.
Und try a one-class support vector machine.
This approach tries to model the boundary, and will give you a binary decision on whether a point should be in the class, or not. It can be seen as a simple density estimation. The main benefit is that the support vector art will be much smaller than the training data.
Or simply use the nearest-neighbor distances to rank users.

Machine Learning Algorithm for Dynamic Environments

Which methods are best for managing and predicting and labeling data in dynamic environment? The system data distribution changes and it is not static. The system can have different normal settings and under different settings, we have different normal data distributions. Consider we have two classes. Normal and abnormal. What happens? We cannot say that we can rely on historical data and train a simple classification method to predict future observations since one day after training the model, data distribution can change and old observations will become irrelevant to new ones. Consider the following figure:
Blue distribution and red distribution are normal data but under different setting and in the training time we have just one setting. This data is for one sensor. So, suppose we train a model with blue one and also have some abnormal samples. Imagine abnormals samples as normal samples with a little bit noise or fault in measurements. Then, we want to test the model but setting changes and now we have red distribution as our test observations. So, the model misclassifies the samples.
What are the best methods for a situation like this? Please note that I have tried several clustering algorithms but they cannot manage and distinguish between normal and abnormal samples.
Any suggestion and help are highly welcomed. Thanks
There are plenty of books on time series data.
In particular, on change detection. Your example can supposedly be considered a change in mean. There are statistical models to detect this.
Basseville, Michèle, and Igor V. Nikiforov. Detection of abrupt changes: theory and application. Vol. 104. Englewood Cliffs: Prentice Hall, 1993.

MobileNet vs SqueezeNet vs ResNet50 vs Inception v3 vs VGG16

I have recently been looking into incorporating the machine learning release for iOS developers with my app. Since this is my first time ever using anything ML related I was very lost when I started reading the different model descriptions that Apple has made available. They have the same purpose/description, the only difference being the actual file size. What is the difference between these models and how would you know which one is best fit ?
The models Apple makes available are just for simple demo purposes. Most of the time, these models are not sufficient for use in your own app.
The models on Apple's download page are trained for a very specific purpose: image classification on the ImageNet dataset. This means they can take an image and tell you what the "main" object is in the image, but only if it's one of the 1,000 categories from the ImageNet dataset.
Usually, this is not what you want to do in your own apps. If your app wants to do image classification, typically you want to train a model on your own categories (like food or cars or whatever). In that case you can take something like Inception-v3 (the original, not the Core ML version) and re-train it on your own data. That gives you a new model, which you then need to convert to Core ML again.
If your app wants to do something other than image classification, you can use these pretrained models as "feature extractors" in a larger neural network structure. But again this involves training your own model (usually from scratch) and then converting the result to Core ML.
So only in a very specific use case -- image classification using the 1,000 ImageNet categories -- are these Apple-provided models useful to your app.
If you do want to use any of these models, the difference between them is speed vs. accuracy. The smaller models are fastest but also least accurate. (In my opinion, VGG16 shouldn't be used on mobile. It's just too big and it's no more accurate than Inception or even MobileNet.)
SqueezeNets are fully convolutional and use Fire modules which have a squeeze layer of 1x1 convolutions which vastly decreases parameters as it can restrict the number of input channels each layer. This makes SqueezeNets extremely low latency, in addition to the fact they don't have dense layers.
MobileNets utilise depth-wise separable convolutions, very similar to inception towers in inception. These also reduce the number of a parameters and hence latency. MobileNets also have useful model-shrinking parameters than you can call before training to make it exact size you want. The Keras implementation can use ImageNet pre-trained weights too.
The other models are very deep, large models. The reduced number of parameters / style of convolution is not used for low latency but just for the ability to train very deep models, essentially. ResNet introduced residual connections between layers which were originally believed to be key in training very deep models. These aren't seen in the previously mentioned low latency models.

How to best deal with a feature relating to what type of expert labelled the data that becomes unavailable at point of classification?

Essentially I have a data set, that has a feature vector, and label indicating whether it is spam or non-spam.
To get the labels for this data, 2 distinct types of expert were used each using different approaches to evaluate the item, the type of expert used then also became a feature in the vector.
Training and then testing on a separate portion of the data has achieved a high degree accuracy using a Random Forest algorithm.
However, it is clear now that, the feature describing the expert who made the label will not be available in a live environment. So I have tried a number of approaches to reflect this:
Remove the feature from the set and retrain and test
Split the data into 2 distinct sets based on the feature, and then train and test 2 separate classifiers
For the test data, set the feature in question all to the same value
With all 3 approaches, the classifiers have dropped from being highly accurate, to being virtually useless.
So I am looking for any advice or intuitions as to why this has occurred and how I might approach resolving it so as to regain some of the accuracy I was previously seeing?
To be clear I have no background in machine learning or statistics and am simply using a third party c# code library as a black box to achieve these results.
Sounds like you've completely overfit to the "who labeled what" feature (and combinations of this feature with other features). You can find out for sure by inspecting the random forest's feature importances and checking whether the annotator feature ranks high. Another way to find out is to let the annotators check each other's annotations and compute an agreement score such as Cohen's kappa. A low value, say less than .5, indicates disagreement among the annotators, which makes machine learning very hard.
Since the feature will not be available at test time, there's no easy way to get the performance back.

Data mining, Machine Learning : Click prediction using Logit

I am an ml noob. I have a task at hand of predicting click probability given user information like city, state, os version, os family, device, browser family browser version, city, etc.
I have been recommended to try logit since logit seems to be what MS and Google are using too.
I have some questions regarding logistic regression like:
Click and non click is a very very unbalanced class and the simple glm predictions do not look good. How to make the data work through this?
All variables I have are categorical and things like device and city can be numerous. Also the frequency of occurrence of some devices or some cities can be very very low. So how to deal with what I can say is a very random variety of categorical variables?
One of the variables that we get is device id also. This is a very unique feature that can be translated to a user's identity. How to make use of it in logit, or should it be used in a completely different model based on a user identity?

Resources