interpretation autocorrelation and partial autocorrelation plot - autocorrelation

For a project I need to use the SARIMA model. I have read that before I train the model I need to perform a Dickey Fuller test to confirm that the data is stationary and I also read that I should plot the autocorrelation and partial autocorrelation to determine the best parameters for the model.
I would like some help with interpreting the plots.
Image of autocorrelation plots
It is clear that the 'lollipops' represent the correlation between the present value and the lag/k interval value, I also understand that the light blue area depicts the 95% confidence interval with significance testing but the relationship between the lollipops and the light blue area is unclear to me.
How can I use this plot to determine the parameters which I should use to build my SARIMA model?

Related

Pattern Recognition using Machine learning

I have many evolution curves (on time), of a system as images.
These evolution curves are plotted when the system behave in a normal way ('ok').
I want to train a model, which learn the shapes of the curves (or parts of the shapes) when it behave in a normal way, so it will be able to classify new curves to normal (or abnormal).
Any ideas of the model to use, or how to proceed ?
Thank you
You can perform PCA, and then classify. Also look for functional data analysis
Here is a nice getting started guide with PCA
You can start with labeling (annotating) the images. The label can be as Normal/ Not Normal as 0/1 or as many classes you want to divide the data into.
Since it's a chart so the orientation is important, a wrong orientation can destroy the meaning of the image.
So make an algorithm which always orient the chart in the same way while reading.
Now that the labeling is done you need to train these images for correct classification.
Augment the data if needed
Find a image classification model
Use the trained weights
feed you images and annotations in the desired format
Train the model
Check for the output error or classification errors.
Create an evaluation matrix like confusion matrix in case of classification.
If the model is right and training is properly done you will get good accuracy.
Otherwise repeat the steps.
This is just an overview, with this you can start towards your goal.

Why doesn't the hidden state of a neuron network provide better dimension reduction result than original input?

I just read a great post here. I am curious about content of "An example with images" in that post. If the hidden states mean a lot of features of the original picture and getting closer to final result, using dimension reduction on hidden states should provide better result than the original raw pixels, I think.
Hence, I tried it on mnist digits with 2 hidden layers of 256 unit NN, using T-SNE for dimension reduction; the result is far from ideal. From left to right, top to bot, they are raw pixels, second hidden layer and final prediction. Can anyone explain that?
BTW, the accuracy of this model is around 94.x%.
You have ten classes, and as you mentioned, your model is performing well on this dataset - so in this 256 dimensional space - the classes are separated well using linear subspaces.
So why T-SNE projections don't have this property?
One trivial answer which comes to my mind is that projecting a highly dimensional set to two dimensions may lose the linear separation property. Consider following example : a hill where one class is at its peak and second - on lower height levels around. In three dimensions these classes are easily separated by a plane but one can easily find a two dimensional projection which doesn't have that property (e.g. projection in which you are loosing the height dimension).
Of course T-SNE is not such linear projections but it's main purpose is to preserve a local structure of data, so that general property like linear separation property might be easly losed when using this approach.

Dimension Reduction of Feature in Machine Learning

Is there any way to reduce the dimension of the following features from 2D coordinate (x,y) to one dimension?
Yes. In fact, there are infinitely many ways to reduce the dimension of the features. It's by no means clear, however, how they perform in practice.
A feature reduction usually is done via a principal component analysis (PCA) which involves a singular value decomposition. It finds the directions with highest variance -- that is, those direction in which "something is going on".
In your case, a PCA might find the black line as one of the two principal components:
The projection of your data onto this one-dimensional subspace than yields the reduced form of your data.
Already with the eye one can see that on this line the three feature sets can be separated -- I coloured the three ranges accordingly. For your example, it is even possible to completely separate the data sets. A new data point then would be classified according to the range in which its projection onto the black line lies (or, more generally, the projection onto the principal component subspace) lies.
Formally, one could obtain a division with further methods that use the PCA-reduced data as input, such as for example clustering methods or a K-nearest neighbour model.
So, yes, in case of your example it could be possible to make such a strong reduction from 2D to 1D, and, at the same time, even obtain a reasonable model.

How do I make a U-matrix?

How exactly is an U-matrix constructed in order to visualise a self-organizing-map? More specifically, suppose that I have an output grid of 3x3 nodes (that have already been trained), how do I construct a U-matrix from this? You can e.g. assume that the neurons (and inputs) have dimension 4.
I have found several resources on the web, but they are not clear or they are contradictory. For example, the original paper is full of typos.
A U-matrix is a visual representation of the distances between neurons in the input data dimension space. Namely you calculate the distance between adjacent neurons, using their trained vector. If your input dimension was 4, then each neuron in the trained map also corresponds to a 4-dimensional vector. Let's say you have a 3x3 hexagonal map.
The U-matrix will be a 5x5 matrix with interpolated elements for each connection between two neurons like this
The {x,y} elements are the distance between neuron x and y, and the values in {x} elements are the mean of the surrounding values. For example, {4,5} = distance(4,5) and {4} = mean({1,4}, {2,4}, {4,5}, {4,7}). For the calculation of the distance you use the trained 4-dimensional vector of each neuron and the distance formula that you used for the training of the map (usually Euclidian distance). So, the values of the U-matrix are only numbers (not vectors). Then you can assign a light gray colour to the largest of these values and a dark gray to the smallest and the other values to corresponding shades of gray. You can use these colours to paint the cells of the U-matrix and have a visualized representation of the distances between neurons.
Have also a look at this web article.
The original paper cited in the question states:
A naive application of Kohonen's algorithm, although preserving the topology of the input data is not able to show clusters inherent in the input data.
Firstly, that's true, secondly, it is a deep mis-understanding of the SOM, thirdly it is also a mis-understanding of the purpose of calculating the SOM.
Just take the RGB color space as an example: are there 3 colors (RGB), or 6 (RGBCMY), or 8 (+BW), or more? How would you define that independent of the purpose, ie inherent in the data itself?
My recommendation would be not to use maximum likelihood estimators of cluster boundaries at all - not even such primitive ones as the U-Matrix -, because the underlying argument is already flawed. No matter which method you then use to determine the cluster, you would inherit that flaw. More precisely, the determination of cluster boundaries is not interesting at all, and it is loosing information regarding the true intention of building a SOM. So, why do we build SOM's from data?
Let us start with some basics:
Any SOM is a representative model of a data space, for it reduces the dimensionality of the latter. For it is a model it can be used as a diagnostic as well as a predictive tool. Yet, both cases are not justified by some universal objectivity. Instead, models are deeply dependent on the purpose and the accepted associated risk for errors.
Let us assume for a moment the U-Matrix (or similar) would be reasonable. So we determine some clusters on the map. It is not only an issue how to justify the criterion for it (outside of the purpose itself), it is also problematic because any further calculation destroys some information (it is a model about a model).
The only interesting thing on a SOM is the accuracy itself viz the classification error, not some estimation of it. Thus, the estimation of the model in terms of validation and robustness is the only thing that is interesting.
Any prediction has a purpose and the acceptance of the prediction is a function of the accuracy, which in turn can be expressed by the classification error. Note that the classification error can be determined for 2-class models as well as for multi-class models. If you don't have a purpose, you should not do anything with your data.
Inversely, the concept of "number of clusters" is completely dependent on the criterion "allowed divergence within clusters", so it is masking the most important thing of the structure of the data. It is also dependent on the risk and the risk structure (in terms of type I/II errors) you are willing to take.
So, how could we determine the number classes on a SOM? If there is no exterior apriori reasoning available, the only feasible way would be an a-posteriori check of the goodness-of-fit. On a given SOM, impose different numbers of classes and measure the deviations in terms of mis-classification cost, then choose (subjectively) the most pleasing one (using some fancy heuristics, like Occam's razor)
Taken together, the U-matrix is pretending objectivity where no objectivity can be. It is a serious misunderstanding of modeling altogether.
IMHO it is one of the greatest advantages of the SOM that all the parameters implied by it are accessible and open for being parameterized. Approaches like the U-matrix destroy just that, by disregarding this transparency and closing it again with opaque statistical reasoning.

How to match texture similarity in images?

What are the ways in which to quantify the texture of a portion of an image? I'm trying to detect areas that are similar in texture in an image, sort of a measure of "how closely similar are they?"
So the question is what information about the image (edge, pixel value, gradient etc.) can be taken as containing its texture information.
Please note that this is not based on template matching.
Wikipedia didn't give much details on actually implementing any of the texture analyses.
Do you want to find two distinct areas in the image that looks the same (same texture) or match a texture in one image to another?
The second is harder due to different radiometry.
Here is a basic scheme of how to measure similarity of areas.
You write a function which as input gets an area in the image and calculates scalar value. Like average brightness. This scalar is called a feature
You write more such functions to obtain about 8 - 30 features. which form together a vector which encodes information about the area in the image
Calculate such vector to both areas that you want to compare
Define similarity function which takes two vectors and output how much they are alike.
You need to focus on steps 2 and 4.
Step 2.: Use the following features: std() of brightness, some kind of corner detector, entropy filter, histogram of edges orientation, histogram of FFT frequencies (x and y directions). Use color information if available.
Step 4. You can use cosine simmilarity, min-max or weighted cosine.
After you implement about 4-6 such features and a similarity function start to run tests. Look at the results and try to understand why or where it doesnt work. Then add a specific feature to cover that topic.
For example if you see that texture with big blobs is regarded as simmilar to texture with tiny blobs then add morphological filter calculated densitiy of objects with size > 20sq pixels.
Iterate the process of identifying problem-design specific feature about 5 times and you will start to get very good results.
I'd suggest to use wavelet analysis. Wavelets are localized in both time and frequency and give a better signal representation using multiresolution analysis than FT does.
Thre is a paper explaining a wavelete approach for texture description. There is also a comparison method.
You might need to slightly modify an algorithm to process images of arbitrary shape.
An interesting approach for this, is to use the Local Binary Patterns.
Here is an basic example and some explanations : http://hanzratech.in/2015/05/30/local-binary-patterns.html
See that method as one of the many different ways to get features from your pictures. It corresponds to the 2nd step of DanielHsH's method.

Resources