Finding source of cause in Anomaly detection - machine-learning

We have an Anomaly detection model using Autoencoders, which takes in 13 parameters. We want to detect which parameter is causing this anomaly.
Till now, we are exploring on how to do this, but haven't come across anything. Can anyone suggest some algorithm(s) on how to do this?

SHAP values are nice for this. The shap Python library supports scikit-learn IsolationForest since October 2019, so that would be the easiest method.
You should be able to use the DeepExplainer in shap for an Autoencoder implemetned in Keras/Tensorflow. There is also the generic/black-box KernelExplainer, that can be applied to any model.

Related

Difference frameworks to do face matching

I try to make the correspondence between two faces and give as a result if two faces match or not.
To do this, I did some research and I found the face comparison package (https://pypi.org/project/face-compare/) that allows me to do this, and it works very well which is based on FaceNet. But here, I want to compare the accuracy of this solution with other solutions to choose the best one. Can anyone have an idea of other solutions (open source or commercial) that can help me for this benchmark
The FaceNet work should be a good start. The network does a good feature matching for the facial data. Even though the face-compare library uses the same model, it would be good if you can fine-tune the FaceNet model on another dataset and evaluate with respect to the output form face-compare.
Apart from that, different variants of siamese architecture can be tried for feature matching. If you want to compare the matching, try getting the triplet loss value for set of images.

Use feedback or reinforcement in machine learning?

I am trying to solve some classification problem. It seems many classical approaches follow a similar paradigm. That is, train a model with some training set and than use it to predict the class labels for new instances.
I am wondering if it is possible to introduce some feedback mechanism into the paradigm. In control theory, introducing a feedback loop is an effective way to improve system performance.
Currently a straight forward approach on my mind is, first we start with a initial set of instances and train a model with them. Then each time the model makes a wrong prediction, we add the wrong instance into the training set. This is different from blindly enlarge the training set because it is more targeting. This can be seen as some kind of negative feedback in the language of control theory.
Is there any research going on with the feedback approach? Could anyone shed some light?
There are two areas of research that spring to mind.
The first is Reinforcement Learning. This is an online learning paradigm that allows you to get feedback and update your policy (in this instance, your classifier) as you observe the results.
The second is active learning, where the classifier gets to select examples from a pool of unclassified examples to get labelled. The key is to have the classifier choose the examples for labelling which best improve its accuracy by choosing difficult examples under the current classifier hypothesis.
I have used such feedback for every machine-learning project I worked on. It allows to train on less data (thus training is faster) than by selecting data randomly. The model accuracy is also improved faster than by using randomly selected training data. I'm working on image processing (computer vision) data so one other type of selection I'm doing is to add clustered false (wrong) data instead of adding every single false data. This is because I assume I will always have some fails, so my definition for positive data is when it is clustered in the same area of the image.
I saw this paper some time ago, which seems to be what you are looking for.
They are basically modeling classification problems as Markov decision processes and solving using the ACLA algorithm. The paper is much more detailed than what I could write here, but ultimately they are getting results that outperform the multilayer perceptron, so this looks like a pretty efficient method.

OpenCV Cascade Classification with Histogram of Oriented Gradients (HOGs) feature type

I am trying to use the OpenCV's cascade classifier based on Histogram of Oriented Objects (HOGs) feature type -- such as the paper "Fast Human Detection Using a Cascade of Histograms of Oriented Gradients".
Searching in the web, I found that the Cascade Classificator of OpenCV only supports HAAR/LBP feature type (OpenCV Cascade Classification).
Is there a way to use HOGs with the OpenCV cascade classifier? What
do you suggest?
Is there a patch or another library that I can use?
Thanks in advance!
EDIT 1
I've kept my search, when I finally found in android-opencv that there is a trunk in Cascade Classifier which allows it to work with HOG features. But I don't know if it works...
Link: http://code.opencv.org/projects/opencv/repository/revisions/6853
EDIT 2
I have not tested the fork above because my problem has changed. But I found an interesting link which may be very useful in the future (when I come back to this problem).
This page contains the source code of the paper "Histograms of Oriented Gradients for
Human Detection". Also, more information. http://pascal.inrialpes.fr/soft/olt/
If you use OpenCV-Python, then you have the option of using some additional libraries, such as scikits.image, that have Histogram of Oriented Gradient built-ins.
I had to solve exactly this same problem a few months ago, and documented much of the work (including very basic Python implementations of HoG, plus GPU implementations of HoG using PyCUDA) at this project page. There is code available there. The GPU code should be reasonably easy to modify for use in C++.
It now seems to be available also in the non-python code. opencv_traincascade in 2.4.3 has a HOG featuretype option (which I did not try):
[-featureType <{HAAR(default), LBP, HOG}>]
Yes, you can use cv::CascadeClassifier with HOG features. To do this just load it with hogcascade_pedestrians.xml that you may find in opencv_src-dir/data/hogcascades.
The classifier works faster and its results are much better when it trained with hogcascade in compare with haarcascade...

Is it possible to see the current iteration number in OpenCV's cvKmeans2?

I'm trying to cluster a really large dataset - 3030764x162 into 4000 clusters using the cvKmeans2 function in OpenCV 2.1.
I would like to see which iteration the K-means algorithm is currently in (similar to what is displayed in Matlab), but I don't see any documentation that points to how I can do this.
It's kind of frustrating seeing a blank screen and not knowing when the code is going to terminate!
Thank you.
Unfortunate as it seems, the answer is No, you cannot. There are no debugging/informative statements anywhere in the kmeans function as provided by OpenCV. However, you may edit and add statements to the method as you deem appropriate.
#Sau,
May be you need some other way of doing it. Though my answer is not relevant to OpenCV.
I have not tried in OpenCV, I had once done KMeans clustering for a extremely large data set and it was more a option better than OpenCV as it worked in a distributed mode. Though very lengthy, but still you might be interested. Its Kmeans clustering using Mahout
Check it out

Using Weka for Game Playing

I am doing a project where I have neural networks (or other algorithms) play each other in poker. After each win or loss, I want the neural network (or other algorithm) to update in response to the error of the loss (how this is calculated is unimportant here).
Weka is very nice and I don't want to reinvent the wheel. However, Weka's API seems primarily designed to train from a dataset. Game playing doesn't use a dataset. Rather, the network plays, and then I want it to update itself based on its loss.
Is it possible to use the Weka API to update a network instead of a dataset but on one instance and do this over and over again? I'm I thinking about this right?
The other idea I also want to implement is use a genetic algorithm to update the weights in a neural network, instead of the backpropogation algorithm. As far as I can tell, there is no way to manually specify the weights of a neural network in Weka. This, of course, is vital if using a genetic algorithm for this purpose.
Please help :) Thank you.
Normally weka learning algorithms are batch learning algoritms. What you need are incremental classifier.
From weka docs
Most classifiers need to see all the data before they can be trained, e.g., J48 or SMO. But there are also schemes that can be trained in an incremental fashion, not just in batch mode. All classifiers implementing the weka.classifiers.UpdateableClassifier interface are able to process data in such a way.
See UpdateableClassifier interface to which classifiers implement it.
Also you may look MOA Massive Online Analysis tool which is closely related with weka and all of its classifiers are incremental due to constraints of online learning.
Weka, as far as I can tell, does not do online learning (which is what you're asking about).
It might be better to investigate using competitive analysis for your game.
You may have to reinvent the wheel here. I don't think it's a bad use of time.
I'm currently implementing a learning classifier system, which is pretty simple. I'd also advise looking into these kinds of algorithms. There is an implementation on the internet, but I still prefer to code my own.

Resources