Timeseries denoising and downsampling - time-series

What are some different denoising techniques for time series? What conditions have to be in place to downsample a time series without loss of information?

depends on which kind of time-series data you want to denoise as denoising can be pretty different from sector to sector.
Like, for denoising audio data in the time domain you can use the webRTC noise cancelation module or apply a noise cancelation algorithm (lots of these are out there) to your audio.

Related

SMOTE oversampling for anomaly detection using a classifier

I have sensor data and I want to do live anomaly detection using LOF on the training set to detect anomalies and then apply the labeled data to a classifier to do classification for new data points. I thought about using SMOTE because I want more anamolies points in the training data to overcome the imbalanced classification problem but the issue is that SMOTE created many points which are inside the normal range.
how can I do oversampling without creating samples in the normal data range?
the graph for the data before applying SMOTE.
data after SMOTE
SMOTE is going to linearly interpolate synthetic points between a minority class sample's k-nearest neighbors. This means that you're going to end up with points between a sample and its neighbors. When samples are all over the place like this, it makes sense that you're going to create synthetic points in the middle.
SMOTE should really be used to identify more specific regions in the feature space as the decision region for the minority class. This doesn't seem to be your use case. You want to know which points "don't belong," per se.
This seems like a fairly nice use case for DBSCAN, a density-based clustering algorithm that will identify points beyond some distance, eps, as not belonging to the same neighborhood.

Altering trained images to train neural network

I am currently trying to make a program to differentiate rotten oranges and edible oranges solely based on their external appearance. To do this, I am planning on using a Convolutional Neural Network to train with rotten oranges and normal oranges. After some searching I could only find one database of approx. 150 rotten oranges and 150 normal oranges on a black background (http://www.cofilab.com/downloads/). Obviously, a machine learning model will need at least few thousand oranges to achieve an accuracy above 90 or so percent. However, can I alter these 150 oranges in some way to produce more photos of oranges? By alter, I mean adding different shades of orange on the citrus fruit to make a "different orange." Would this be an effective method of training a neural network?
It is a very good way to increase the number of date you have. What you'll do depends on your data. For example, if you are training on data obtained from a sensor, you may want to add some noise to the training data so that you can increase your dataset. After all, you can expect some noise coming from the sensor later on.
Assuming that you will train it on images, here is a very good github repository that provides means to use those techniques. This python library helps you with augmenting images for your machine learning projects. It converts a set of input images into a new, much larger set of slightly altered images.
Link: https://github.com/aleju/imgaug
Features:
Most standard augmentation techniques available.
Techniques can be applied to both images and keypoints/landmarks on
images. Define your augmentation sequence once at the start of the
experiment, then apply it many times.
Define flexible stochastic ranges for each augmentation, e.g. "rotate
each image by a value between -45 and 45 degrees" or "rotate each
image by a value sampled from the normal distribution N(0, 5.0)".
Easily convert all stochastic ranges to deterministic values to
augment different batches of images in the exactly identical way
(e.g. images and their heatmaps).
Data augmentation is what you are looking for. In you case you can do different things:
Apply filters to get slightly different image, as has been said you can use gaussian blur.
Cut the orange and put it in different backgrounds.
Scale the oranges with different scales factors.
Rotate the images.
create synthetic rotten oranges.
Mix all different combinations of the previous mentioned. With this kind of augmentation you can easily create thousand of different oranges.
I did something like that with a dataset of 12.000 images and I can create 630.000 samples
That is indeed a good way to increase your data set. You can, for example, apply Gaussian blur to the images. They will become blurred, but different from the original. You can invert the images too. Or, in last case, look for new images and apply the cited techniques.
Data augmentation is really good way to boost training set but still not enough to train a deep network end to end on its own given the possibility that it will overfit. You should look at domain adaptation where you take a pretrained model like inception which is trained on imagenet dataset and finetune it for your problem. Since you have to learn only parameters required to classify your use case, it is possible to achieve good accuracies with relatively less training data available. I have hosted a demo of classification with this technique here. Try it out with your dataset and see if it helps. The demo takes care of pretrained model as well as data augmentation for dataset that you will upload.

image augmentation algorithms for preparing deep learning training set

To prepare large amounts of data sets for training deep learning-based image classification models, we usually have to rely on image augmentation methods. I would like to know what are the usual image augmentation algorithms, are there any considerations when choosing them?
The litterature on data augmentation is very very large and very dependent on your kind of applications.
The first things that come to my mind are the galaxy competition's rotations and Jasper Snoeke's data augmentation.
But really all papers have their own tricks to get good scores on special datasets for exemples stretching the image to a specific size before cropping it or whatever and this in a very specific order.
More practically to train models on the likes of CIFAR or IMAGENET use random crops and random contrast, luminosity perturbations additionally to the obvious flips and noise addition.
Look at the CIFAR-10 tutorial on TF website it is a good start. Plus TF now has random_crop_and_resize() which is quite useful.
EDIT: The papers I am referencing here and there.
It depends on the problem you have to address, but most of the time you can do:
Rotate the images
Flip the image (X or Y symmetry)
Add noise
All the previous at the same time.

What is the difference between image denoising and image filtering

I m start learning Image Processing,but I'm a little confused about "Image filtering" and "Image denoising". I know they both mean to reduce the noise in the image.I thought "Image filtering" is equal to "Image denoising".
But is there any difference between these two terms?
Would you please tell me the answer?
denoising
is operation specifically removing specific noise from source data set
(usually using filtering in combination with other operations)
filtering
is applying specific filter(s) on dataset
like FIR(finite impulse response),or any kind of convolution,etc
and this operation does not necessarily remove noise.
For example gamma correction is also a filter technique and does not remove noise at all
edge detectors are filters and they usually emphasize noise
erosion/dilatation can also create new noise in data ...
the low pass/band or smooth filters if you like reduce noise by removing specific frequency ranges from dataset but that is only byproduct exploited in denoising techniques

How to detect the voice from an audio stream

I need to determine when someone speaks in an audio stream. I applied the Hamming window and calculated the FFT. How do i detect the human voice from here?
If you want to experiment with your own voice activity detection algorithms, an FFT can be used as an initial stage. Next you might want to try subtracting any characterized stationary spectral noise background. Then you could try using the modified FFT results to calculate a cepstrum (or some weighted cepstral coefficients) for feature extraction. You could then do some statistical pattern matching on whatever feature vectors you decided to extract, and feed the results to a decision algorithm.
Each of the above steps has likely been a research topic, and a good implementation might involve studying dozens of published research papers, which perhaps can be found in your university library.
You don't need to do an FFT for this, you need to implement a Voice Activity Detection algorithm.

Resources