Reusing image-to-image GANs for spatial denoising of trajectories - machine-learning

I work on particle tracking experiments that generate trajectories (x and y coordinates over time) from videos. Some experimental setups result in trajectories with a lot of spatial noise.
I'm looking into using machine-learning models to denoise those trajectories, as our available algorithmic methods are limited. My goal is to train the model with two inputs : simulated trajectories as ground truth, and the same trajectories with induced noise.
So far, most of the solutions I found regarding multiple inputs models that aren't classification or regression point to CNNs. However, I came across image-to-image denoising models (such as https://arxiv.org/abs/1611.07004) which seem to work based on the same relation between inputs, although with a different shape.
Could it be feasible to readapt such a model for this purpose ?

Related

SMOTE oversampling for anomaly detection using a classifier

I have sensor data and I want to do live anomaly detection using LOF on the training set to detect anomalies and then apply the labeled data to a classifier to do classification for new data points. I thought about using SMOTE because I want more anamolies points in the training data to overcome the imbalanced classification problem but the issue is that SMOTE created many points which are inside the normal range.
how can I do oversampling without creating samples in the normal data range?
the graph for the data before applying SMOTE.
data after SMOTE
SMOTE is going to linearly interpolate synthetic points between a minority class sample's k-nearest neighbors. This means that you're going to end up with points between a sample and its neighbors. When samples are all over the place like this, it makes sense that you're going to create synthetic points in the middle.
SMOTE should really be used to identify more specific regions in the feature space as the decision region for the minority class. This doesn't seem to be your use case. You want to know which points "don't belong," per se.
This seems like a fairly nice use case for DBSCAN, a density-based clustering algorithm that will identify points beyond some distance, eps, as not belonging to the same neighborhood.

Why rotation-invariant neural networks are not used in winners of the popular competitions?

As known, modern most popular CNN (convolutional neural network): VGG/ResNet (FasterRCNN), SSD, Yolo, Yolo v2, DenseBox, DetectNet - are not rotate invariant: Are modern CNN (convolutional neural network) as DetectNet rotate invariant?
Also known, that there are several neural networks with rotate-invariance object detection:
Rotation-Invariant Neoperceptron 2006 (PDF): https://www.researchgate.net/publication/224649475_Rotation-Invariant_Neoperceptron
Learning rotation invariant convolutional filters for texture classification 2016 (PDF): https://arxiv.org/abs/1604.06720
RIFD-CNN: Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection 2016 (PDF): http://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Cheng_RIFD-CNN_Rotation-Invariant_and_CVPR_2016_paper.html
Encoded Invariance in Convolutional Neural Networks 2014 (PDF)
Rotation-invariant convolutional neural networks for galaxy morphology prediction (PDF): https://arxiv.org/abs/1503.07077
Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images 2016: http://ieeexplore.ieee.org/document/7560644/
We know, that in such image-detection competitions as: IMAGE-NET, MSCOCO, PASCAL VOC - used networks ensembles (simultaneously some neural networks). Or networks ensembles in single net such as ResNet (Residual Networks Behave Like Ensembles of Relatively Shallow Networks)
But are used rotation invariant network ensembles in winners like as MSRA, and if not, then why? Why in ensemble the additional rotation-invariant network does not add accuracy to detect certain objects such as aircraft objects - which images is done at a different angles of rotation?
It can be:
aircraft objects which are photographed from the ground
or ground objects which are photographed from the air
Why rotation-invariant neural networks are not used in winners of the popular object-detection competitions?
The recent progress in image recognition which was mainly made by changing the approach from a classic feature selection - shallow learning algorithm to no feature selection - deep learning algorithm wasn't only caused by mathematical properties of convolutional neural networks. Yes - of course their ability to capture the same information using smaller number of parameters was partially caused by their shift invariance property but the recent research has shown that this is not a key in understanding their success.
In my opinion the main reason behind this success was developing faster learning algorithms than more mathematically accurate ones and that's why less attention is put on developing another property invariant neural nets.
Of course - rotation invariance is not skipped at all. This is partially made by data augmentation where you put the slightly changed (e.g. rotated or rescaled) image to your dataset - with the same label. As we can read in this fantastic book these two approaches (more structure vs less structure + data augmentation) are more or less equivalent. (Chapter 5.5.3, titled: Invariances)
I'm also wondering why the community or scholar didn't put much attention on ration invariant CNN as #Alex.
One possible cause, in my opinion, is that many scenarios don't need this property, especially for those popular competitions. Like Rob mentioned, some natural pictures are already taken in a unified horizontal (or vertical) way. For example, in face detection, many works will align the picture to ensure the people are standing on the earth before feeding to any CNN models. To be honest, this is the most cheap and efficient way for this particular task.
However, there does exist some scenarios in real life, needing rotation invariant property. So I come to another guess: this problem is not difficult from those experts (or researchers)' view. At least we can use data augmentation to obtain some rotate invariant.
Lastly, thanks so much for your summarization about the papers. I added one more paper Group Equivariant Convolutional Networks_icml2016_GCNN and its implementation on github by other people.
Object detection is mostly driven by the successes of detection algorithms in world-famous object detection benchmarks like PASCAL-VOC and MS-COCO, which are object centric datasets where most objects are vertical (potted plants, humans, horses, etc.) and thus data augmentation with left-right flips is often sufficient (for all we know data augmentation with rotated images like upside-down flips could even hurt detection performance).
Every year the entire community adopts the base algorithmic structure of the winning solution and build on it (I am exaggerating a bit to prove a point but not so much).
Interestingly other less widely known topics like oriented text detections and oriented vehicle detections in aerial imagery both need rotation invariant features and rotation equivariant detection pipelines (like in both articles from Cheng you mentioned).
If you want to find literature and code in this area you need to dive in these two domains. I can already give you a few pointers like the DOTA challenge for aerial imagery or the ICDAR challenges for oriented text detections.
As #Marcin Mozejko said, CNN are by nature translation invariant and not rotation invariant. It is an open problem how to incorporate perfect rotation invariance the few articles that deal with it have yet to become standards even though some of them seem promising.
My personal favorite for detection is the modification of Faster R-CNN recently proposed by Ma.
I hope that this direction of research will be investigated more and more once people will get fed up of MS-COCO and VOC.
What you could try is take a state-of-the-art detector trained on MS-COCO like Faster R-CNN with NASNet from TF detection API and see how it performs wrt rotating the test image, in my opinion it would be far from rotation invariant.
Rotation invariance is mostly a good thing, but not always. Objects can have different interpretation based on their rotation, eg. if a rotated "1" might be difficult to distinguish from a "7".
First, let's acknowledge that introducing rotational invariance requires a static assumption about the distribution of angles. For example, another commenter on this page suggested rotating the kernel with 30-degree steps. That's equivalent to assuming that useful rotations in each layer are uniformly distributed over the rotation angles.
In contrast to that, when the network learns rotated kernels, the network picks a different distribution of angles for each layer. An interesting research question is to find what distribution of rotation angles is implied by learned kernels. In any case, why would such learning flexibility be useful?
I suspect that the assumption of a uniform distribution might not be equally useful across all layers of a network. In the first few convolutional layers (edges and other basic shapes), it's likely true that the rotation angles are uniformly distributed. However, in the deep layers, this assumption might be less valid. If cars are almost always rotated within a small range of angles, then why waste compute and space on unlikely rotations?
However, the network won't learn the right distribution of angles if the training dataset is not sufficiently representative. Note that simply rotating an image (called data augmentation) is not the same as rotating an object relative to other objects in the same image. I suppose it comes down to your expectation of the difference between the training dataset and the unobserved dataset to which the network has to generalize.
Interestingly, the human visual cortex is not fully rotation-invariant at the scale of major face features. See https://en.wikipedia.org/wiki/Thatcher_effect.

How many learning curves should I plot for a multi-class logistic regression classifier?

If we have K classes, do I have to plot K learning curves?
Because it seems impossible to me to calculate the train/validation error against all K theta vectors at once.
To clarify, the learning curve is a plot of the training & cross validation/test set error/cost vs training set size. This plot should allow you to see if increasing the training set size improves performance. More generally, the learning curve allows you to identify whether your algorithm suffers from a bias (under fitting) or variance (over fitting) problem.
It depends. Learning curves do not concern themselves with the number of classes. Like you said, it is a plot of training set and test set error, where that error is a numerical value. This is all learning curves are.
That error can be anything you want: accuracy, precision, recall, F1 score etc. (even MAE, MSE and others for regression).
However, the error you choose to use is the one that does or does not apply to your specific problem, which in turn indirectly affects how you should use learning curves.
Accuracy is well defined for any number of classes, so if you use this, a single plot should suffice.
Precision and recall, however, are defined only for binary problems. You can somewhat generalize them (see here for example) by considering the binary problem with classes x and not x for each class x. In that case, you will probably want to plot learning curves for each class. This will also help you identify problems relating to certain classes better.
If you want to read more about performance metrics, I like this paper a lot.

Image Similarity - Deep Learning vs hand-crafted features

I am doing research in the field of computer vision, and am working on a problem related to finding visually similar images to a query image. For example, finding t-shirts of similar colour with similar patterns (Striped/ Checkered), or shoes of similar colour and shape, and so on.
I have explored hand-crafted image features such as Color Histograms, Texture features, Shape features (Histogram of Oriented Gradients), SIFT and so on. I have also read up literature about Deep Neural Networks (Convolutional Neural Networks), which have been trained on massive amounts of data and are currently state of the art in Image Classification.
I was wondering if the same features (extracted from the CNN's) can also be used for my project - finding fine-grained similarities between images. From what I understand, the CNNs have learnt good representative features that can help classify images - for example, be it a red shirt or a blue shirt or an orange shirt, it is able to identify that the image is a shirt. However it doesn't understand that an orange shirt looks more similar to a red shirt than a blue shirt does, and hence it is not able to capture these similarities.
Please correct me if I am wrong. I would like to know if there are any Deep Neural Networks that capture these similarities, and have proven to be superior to the hand-crafted features. Thanks in advance.
For your task, a CNN is definitely worth a try!
Many researchers used networks which are pretrained for Image Classification and obtained state-of-the-art results on fine-grained classification. For example, trying to classify birds species or cars.
Now, your task is not classification, but it is related. You can think about similarity as some geometric distance between features, which are basically vectors. Thus, you may carry out some experiments computing the distance between the feature vectors for all your training images (the reference) and the feature vector extracted from the query image.
CNNs features extracted from the first layers of the net should be more related to color or other graphical traits, rather than more "semantical" ones.
Alternatively, there is some work on learning directly a similarity metric through CNN, see here for example.
A little bit out-dated, but it can still be useful for other people. Yes, CNNs can be used for image similarity and I used before. As Flavio pointed out, for a simple start, you can use a pre-trained CNN of your choice such as Alexnet,GoogleNet etc.. and then use it as feature extractor. You can compare the features based on the distance, similar pictures will have a smaller distance between their feature vectors.

Accelerated SVM training for HOG algorithm

Let's say I have a perfect 3D model of the rigid object I am looking for.
I want to find this object in a scene image using the histogram of oriented gradients (HOG) algorithm.
One way to train my SVM would be to render this object on top of a bunch of random backgrounds, in order to generate the positive training examples.
But, is there a faster, more direct way to use the model to train the SVM? One that doesn't involve rendering it multiple times?

Resources