MAE and MAPE are not consistent - time-series

I compare two forecasting models using MAE and MAPE:
The first model gives me:
MAE(test): 797.95725
MAPE(test): 220.59072
The second model gives me:
MAE(test): 823.49909
MAPE(test): 203.40554
NOW, i'm very confused ...... which model is better. The first model has less MAE and the second model has less MAPE.

It depends if you are forecasting many time series or only one, and if the values of your time series can have different orders of magnitude. If you are comparing several time series or the values can have different orders of magnitude it is better to choose scale-independent metrics, like MAPE, MdAPE, or Theil's U-statistic.
Personally, I prefer MAPE instead of MAE, because it considers relative errors and not absolute errors.

Related

What GARCH model do I use for relative spread series?

I have tried multiple GARCH model variations to remove financial time series characteristics from my dataset. I mainly tried ARMA(1,1),sGARCH models with normal distribution. My standardized residuals and squared standardized residuals don't show serial correlation anymore, which is good. However, the values for the goodness of fit test are always 0 or a very small number, which I think indicates that the model choice is not appropriate. What GARCH specification should I use?
(my dataset is a financial time series of daily relative spreads, so the spreads divided by the mean of ask en bid price of that day)

What's the correct approach to evaluate a model using K-fold cross validation over multiple seeds?

I am training a deep learning model using a 5-fold CV over three random seeds (random seeds are for model initialization, CV is split once). For each fold, I save the best model. Hence, I get 15 models after the simulation. To assess the performance, I take the best of these 15 models (unchanged during the entire evaluation process) and evaluate it using the validation fold of all the 5-folds for each seed. I then average the results across these seeds.
I would like to know if I am doing the right thing here.
I have read that there are two ways to compute CV performance: [1] pooling, where the performance is calculated globally over the union of all the test sets [2] averaging, where the performance is computed for every test set separately, with results being the average of these.
I intend to use method two (averaging).
Yes, you can use the averaging method for the 5-fold CV, but I don't understand what you mean by "For each fold, I save the best model". Moreover, three random seed values are not enough. You should use at least 10 different values and plot a boxplot for the corresponding results across these seeds.

evaluating accuracy of decision tree/forrest model

Im relatively new to ML. Ive created a decision tree model to predict prices of an item based on some criteria.
For an example, lets say the model predicts the price of a car based on a few features such as engine size, number of doors, fuel type, mileage and age.
Analysis of the data showed me that my data was not linear, so decision tree was a better fit. The model also does an ok job at predicting but before i can give it to any users, i need to quantify its accuracy.
As its non linear, R squared doesnt seem liek a good method of assessing accuracy, but im unsure what i should use.
Appreciate any advice on this.
In these cases, what you can usually do is to assess the performance of the model against a test or hold-out set (not used during the construction of the model), using a evaluation metric.
For regression problems (like the ones you are describing) there are several evaluation metrics available. The most common ones are MAE (Mean Absolute Error) and RMSE (Root Mean Squared Error)
To fully understand how good the performance of your model is, you can then compare it against other models, or against simple baselines (like predicting always the average price, or returning the price of the most similar car in the training set).

machine learning, nominal data normalization

i am working on kmeans clustering .
i have 3d dataset as no.days,frequency,food
->day is normalized by means & std deviation(SD) or better to say Standardization. which gives me range of [-2 to 14]
->for frequency and food which are NOMINAL data in my data sets are normalized by DIVIDE BY MAX ( x/max(x) ) which gives me range [0 to 1]
the problem is that the kmeans only considers the day-axis for grouping since there is obvious gap b/w points in this axis and almost ignores the other two of frequency and food (i think because of negligible gaps in frequency and food dims ).
if i apply the kmeans only on day-axis alone (1D) i get the exact similar result as i applied on 3D(days,frequency,food).
"before, i did x/max(x) as well for days but not acceptable"
so i want to know is there any way to normalize the other two nominal data of frequency and food and we get fair scaling based on DAY-axis.
food => 1,2,3
frequency => 1-36
The point of normalization is not just to get the values small.
The purpose is to have comparable value ranges - something which is really hard for attributes of different units, and may well be impossible for nominal data.
For your kind of data, k-means is probably the worst choice, because k-means relies on continuous values to work. If you have nominal values, it usually gets stuck easily. So my main recommendation is to not use k-means.
For k-means to wprk on your data, a difference of 1 must be the same in every attribute. So 1 day difference = difference between food q and food 2. And because k-means is based on squared errors the difference of food 1 to food 3 is 4x as much as food to food 2.
Unless you have above property, don't use k-means.
You can try to use the Value Difference Metric, VDM (or any variant) to convert pretty much every nominal attribute you encounter to a valid numeric representation. An after that you can just apply standardisation to the whole dataset as usual.
The original definition is here:
http://axon.cs.byu.edu/~randy/jair/wilson1.html
Although it should be easy to find implementations for every common language elsewhere.
N.B. for ordered nominal attributes such as your 'frequency' most of the time it is enough to just represent them as integers.

What Does The MAE Actually Telling me?

I've created a simple linear regression model to predict S&P 500 closing prices. then calculated the Mean Absolute Error (MAE) and got an MAE score of 1290. Now, I don't want to know if this is right or wrong but I want to know what MAE of 1290 is telling me about my model.
To be honest "in general" it tells you nearly nothing. The value is quite arbitrary, and only if you understand exactly your data you can draw any conclusions.
MAE stands for Mean Absolute Error, thus if yours is 1290 it means, that if you randomly choose a data point from your data, then, you would expect your prediction to be 1290 away from the true value. Is it good? Bad? Depends on the scale of your output. If it is in millions, then the error this big is nothing, and the model is good. If your output values are in the range of thousands, this is horrible.
If I understand correctly S&P 500 closing prices are numbers between 0 and 2500 (for last 36 years), thus error of 1290 looks like your model learned nothing. This is pretty much like a constant model, always answering "1200" or something around this value.
MAE obtained with a model should always be verified against a baseline model.
A frequently used baseline is median value assignment. Calculate the MAE for the case when all your predictions are always equal to the median of your target variable vector, then see for yourself if your model's MAE is significantly below that. If it is — congrats.
Note that, in this case the baseline MAE will depend on the target distribution. If your test sample contains lots of instances that are really close to the median, then it will be almost impossible to get a model with a MAE better than the baseline. Thus, MAE should only be used when your test sample is sufficiently diverse. In the extreme case of only 1 instance in the test sample you will get the baseline MAE=0, which will always be no worse than any model you may come up with.
This issue with MAE is especially notable, when you get a MAE for your total sample and then want to check how it changes across different subsamples. Say, you have a model that predicts yearly income based on education, age, marital status etc. You get a MAE of $1.2k, the baseline MAE is $5k, so you conclude that your model is pretty good. Then you want to check how the model deals with bottom-earners and get a MAE of $1.7k with a baseline of $0.5k. The same is likely to occur, if you inspect the errors in the 18-22yo demographics.

Resources