Identify time-series forecasting algorithm [closed] - machine-learning

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm trying to build an algorithm in C# based on these videos (CLICK!) My question is not related to the coding part of these tasks.
I'm trying to gain a deeper understanding of this algorithm since it is perfect for my assignment. However, the YouTuber doesn't identify it by name.I'd like to know any information that you can give me -- name, resources, etc.
Edit: It's Time-series decomposition model. Specifically, classical multiplicative decomposition.
Steps:
Calculate a moving average equal to the length of the season to
identify the trend cycle.
Center the moving average if the seasonal length is an even
number.
Calculate the actual as a proportion of the centered moving
average to obtain the seasonal index for each period.
Adjust the total of the seasonal indexes to equal the number of
periods.
Deseasonalized the time series by dividing it by the seasonal
index.
Estimated the trend-cyclical regression using deseasonalized
data.
Multiply the fitted trend values by their appropriate seasonal
factors to compute the fitted values
Calculate the errors and measure the accuracy of the fit using
known actual series.
If cyclical factors are important, calculate cyclical indexes.
Check for outliers, adjust the actual series and repeat steps
from 1 to 9 if necessary

It is a well known, well documented, identifiable algorithm.
One of the comments to the video says "What you did is Moving Average, can you please show us how to do Auto Regressive (AR) and Auto Regressive Moving Average (ARMA) if it is possible in Excel?"
You can learn about MA, AR, and AR(I)MA from this book - https://otexts.com/fpp2/

Related

Deep learning for 3d datasets, what is the best way to prepare a pipeline? and which algorithm would be best? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
So I have a point cloud or a 3D grid, each grid contains the following data (features/attributes): Grid type (building, tree, grass, soil, or blank). But then space type "Buildings" has sub-attributes such as conductivity and reflective values. Besides, the attributes of the cells there are a couple of individual other attributes that apply to the whole data set, examples of these attributes are wind speed, temperature,....etc. I would like to know what Deep learning algorithm would be helpful to predict the values of air temperature in each grid cell (in the x, y direction only) based on the 3d attributes I explained above. Also, what would be the best way to prepare a pipeline for this?. The goal is to predict air temperate values when I feed the trained model a data set that has geometric model info, the wind direction, and the wind speed)
Here is an example image of 300 images I have ( have the images, and I have a data set of all the attributes of each grid cell, and the air temperature value). The image is modeled inside a cube of 60x60x60 cells, when a cell contains a building the space type is set to a "building", when a cell has air the space type is set to "blank" and so on...., as I mentioned, each "building" cell contains additional sub-attributes. The values I'm trying to predict are values of air temperature at each BLANK cell (around buildings) at an x,y plane (let's say at height z=2), in this image, the x,y plane is the colored plane. I have the values in numbers, not just colored planes.
Also here is a small portion of the data I have and the results (y values = air temp).
The fact that your problem is 3D does not mean your dataset have to be.
This seems to me like a very straight forward Machine Learning problem, you could reformat your data into one dateset were each rows contains : cell location (x, y, z), cell type, sub_attribute.... and the target : temperature.
The preprocessing required will depend on the kind of model you choose, some don't support categorical input others do.
You can use Deep Learning if you prefer, but they typically don't work with categorical variable so you'll have to encode all textual information, and 300 instances is very small to train that kind of model.
You might have more luck with a Random Forest algorithm as a first step.

Pytorch - Should 'CenterCrop' be used to test set? Does this count as cheating? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm learning image classification with Pytorch. I found some papers code use 'CenterCrop' to both train set and test set,e.g. Resize to larger size,then apply CenterCrop to obtain a smaller size. The smaller size is a general size in this research direction.
In my experience, I found apply CenterCrop can get a significant improvement(e.g. 1% or 2%) on test, compare to without CenterCrop on test set.
Because it is used in the top conference papers, confused me. So, Should CenterCrop be used to test set this count as cheating? In addition, should I use any data augmentation to test set except 'Resize' and 'Normalization'?
Thank you for your answer.
That is not cheating. You can apply any augmentation as long as the label is not used.
In image classification, sometimes people use a FiveCrop+Reflection technique, which is to take five crops (Center, TopLeft, TopRight, BottomLeft, BottomRight) and their reflections as augmentations. They would then predict class probabilities for each crop and average the results, typically giving some performance boost with 10X running time.
In segmentation, people also use similar test-time augmentation "multi-scale testing" which is to resize the input image to different scales before feeding it to the network. The predictions are also averaged.
If you do use such kind of augmentation, do report them when you compare with other methods for a fair comparison.

Feature scaling (normalization) in multiple regression analysis with normal equation method? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
I am doing linear regression with multiple features. I decided to use normal equation method to find coefficients of linear model. If we use gradient descent for linear regression with multiple variables we typically do feature scaling in order to quicken gradient descent convergence. For now, I am going to use normal equation formula:
I have two contradictory information sources. In 1-st it is stated that no feature scaling required for normal equations. In another I can see that feature normalization has to be done.
Sources:
http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=MachineLearning&doc=exercises/ex3/ex3.html
http://puriney.github.io/numb/2013/07/06/normal-equations-gradient-descent-and-linear-regression/
At the end of these two articles information concerning feature scaling in normal equations presented.
The question is do we need to do feature scaling before normal equation analysis?
You may indeed not need to scale your features, and from theoretical point of view you get solution in just one "step". In practice, however, things might be a bit different.
Notice matrix inversion in your formula. Inverting a matrix is not quite trivial computational operation. In fact, there's a measure of how hard it's to invert a matrix (and perform some other computations), called condition number:
If the condition number is not too much larger than one (but it can still be a multiple of one), the matrix is well conditioned which means its inverse can be computed with good accuracy. If the condition number is very large, then the matrix is said to be ill-conditioned. Practically, such a matrix is almost singular, and the computation of its inverse, or solution of a linear system of equations is prone to large numerical errors. A matrix that is not invertible has condition number equal to infinity.
P.S. Large condition number is actually the same problem that slows down gradient descent's convergence.
You don't need to perform feature scaling when using the normal equation. It's useful only for the gradient descent method to optimize the performance. The article from the Stanford University provides the correct information.
Of course you can scale the features in this case as well, but it will not bring you any advantages (and will cost you some additional calculations).

Binary classification using radial basis kernel SVM with a single feature [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
Is there any interpretation (graphical or otherwise) of a radial basis kernel SVM being trained with a single feature? I can visualize the effect in 2 dimensions (the result being a separation boundary that is curved rather than a linear line. (e.g http://en.wikipedia.org/wiki/File:Kernel_Machine.png).
I'm having trouble thinking of what this would be like if your original data only had a single feature. What would the boundary line look like for this case?
In one dimension, your data would be numbers, and decision boundary would be simply finite set of numbers, representing finite set of intervals of classification to one class and finite set of intervals of classification to the another one.
In fact, the decision boundary in R^2 is actually the set of points, for which weighted sum of gaussian distributions in support vectors (where alpha_i are these weights) is equal to b (intercept/threshold term). You can actually draw this distribution (in 3d now). Similarly, in 1d you would get a similar distribution, which could be drawn in 2d, and the decision would be based on this distribution being bigger/smaller than b.
This video shows what happen in a kernel mapping, it is not using the RBF Kernel, but the idea is the same:
http://www.youtube.com/watch?v=3liCbRZPrZA
As for the 1D case, there is not much difference, it would be something like this:
It would look line a line that switches back and forth between two different color (one color for each class). Nothing special happens in 1D besides SVMs being over kill.

How to subtract color pixels [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
A lot of research papers that I am reading these days just abstractly write image1-image2
I imagine they mean gray scale images. But how to extend these to color images ?
Do I take the intensities and subtract ? How would I compute these intensities by taking the average or by taking the weighted average as illustrated here?
Also I would prefer if you could quote the source of this as well preferably from a research paper or a textbook.
Edit: I am working on motion detection where there are tons of algorithms which create a background model of the video(image) and then we subtract the current frame(again a image) from this model. We see if this difference exceeds a given threshold in which case we classify the pixel as foreground pixel. So far I have been subtracting the intensities directly but don't know whether other approach is possible.
Subtraction directly at RGB space or after converting to grayscale space is possible to miss useful information, and at the same time induce many unwanted outliers. It is possible that you don't need the subtraction operation. By investigating the intensity difference between background and object at all three channels, you can determine the range of background at the three channels, and simply set them to zero. This study demonstrated such method is robust against non-salient motion (such as moving leaves) with the presence of shadows at various environments.

Resources