I have a Fast ai collaborative filtering model. I would like to predict on this model for a new tuple.
I am having trouble with the predict function
From their documentation,
Signature: learn.predict(item, rm_type_tfms=None, with_input=False)
Docstring: Prediction on `item`, fully decoded, loss function decoded and probabilities
File: ~/playground/virtualenv/lib/python3.8/site-packages/fastai/learner.py
Type: method
How do I define the Item that I need to pass. Lets say for a movielens dataset, for a user already with in the dataset, we would like to recommend a set of movies, how do we pass the userID?
I have tried to follow somewhat of an answer here - https://forums.fast.ai/t/making-predictions-with-collaborative-filtering/3900
learn.predict( [np.array([3])] )
I seem to get an error: TypeError: list indices must be integers or slices, not list
I think this will help:
https://medium.com/#igorirailean/a-recommender-system-using-fastai-in-google-colab-110d363d422f
The documentation also contains the following information:
dl = learn.dls.test_dl(test_df)
learn.get_preds(dl=dl)
It helped me.
Related
I have the following understanding problem.
I have trained an auto_arima model including an exogenous variable and now I would like to do forecasts based on an existing time series.
My training looked like this:
stepwise_model = auto_arima(train_data,exogenous=exo_train_data,start_p=1, start_q=1,
max_p=7, max_q=7, seasonal=True,start_P=1,start_Q=1,max_P=7,max_D=7,max_Q=7,m=int(7),
d=None,D=None, trace=True,error_action='ignore',suppress_warnings=True, stepwise=True)
forecast = stepwise_model.predict(n_periods=len(test_data),exogenous=exo_test_data)
This also works wonderfully and provides me with the performance values I wanted.
But now that I have trained my model with the complete time series, the question arises how I can make predictions if I do not have future values of the exogenous variables....
# Full Training:
stepwise_model_final = auto_arima(all_data,exogenous=exo_all_data,start_p=1, start_q=1,
max_p=7, max_q=7, seasonal=True,start_P=1,start_Q=1,max_P=7,max_D=7,max_Q=7,m=int(7),
d=None,D=None, trace=True,error_action='ignore',suppress_warnings=True, stepwise=True)
The .predict function in this case requires me to also specify the exogenous variable, which of course I don't have available now:
n=tbd
forecast_final = stepwise_model_final.predict(n_periods=n,exogenous= ??? )
Am I fundamentally misunderstanding something here?
Would be great if you could help me here. I have already searched the internet but found no answer to my question.
Thank you very much !
You need the exogenous variables to make the prediction. Basically, ARIMA performs a regression on the exogenous variables to improve the predictions, therefore you need to pass them to ARIMA.
If you do not have the exogenous variables, you have two options:
Predict the exogenous variables (e.g. with ARIMA)
Forecast the time series only with the time series itself (endogenous ARIMA) without any exogenous variables.
I'm trying to train a model using H2O.ai's H2O-3 Automl Algorithm on AWS SageMaker using the console.
My model's goal is to predict if an arrest will be made based upon the year, type of crime, and location.
My data has 8 columns:
primary_type: enum
description: enum
location_description: enum
arrest: enum (true/false), this is the target column
domestic: enum (true/false)
year: number
latitude: number
longitude: number
When I use the SageMaker console on AWS and create a new training job using the H2O-3 Automl Algorithm, I specify the primary_type, description, location_description, and domestic columns as categorical.
However in the logs of the training job I always see the following two lines:
Converting specified columns to categorical values:
[]
This leads me to believe the categorical_columns attribute in the training hyperparameter is not being taken into account.
I have tried the following hyperparameters with the same output in the logs each time:
{'classification': 'true', 'categorical_columns':'primary_type,description,location_description,domestic', 'target': 'arrest'}
{'classification': 'true', 'categorical_columns':['primary_type','description','location_description','domestic'], 'target': 'arrest'}
I thought the list of categorical columns was supposed to be delimited by comma, which would then be split into a list.
I expected the list of categorical column names to be output in the logs instead of an empty list, like so:
Converting specified columns to categorical values:
['primary_type','description','location_description','domestic']
Can anyone help me figure out how to get these categorical columns to apply to the training of my model?
Also-
I think this is the code that's running when I train my model but I have yet to confirm that: https://github.com/h2oai/h2o3-sagemaker/blob/master/automl/automl_scripts/train#L93-L151
This seems to be a bug by h2o package. The code in https://github.com/h2oai/h2o3-sagemaker/blob/master/automl/automl_scripts/train#L106 shows that it's reading categorical_columns directly from the hyperparameters, not nested under the training field. However when move up the categorical_columns field a level, the algorithm doesn't recognize it. So no solution for this.
It seems based on the code here: https://github.com/h2oai/h2o3-sagemaker/blob/master/automl/automl_scripts/train#L106
that the parameter is looking for a comma separated string. E.g. "cat,dog,bird"
I would try: "primary_type,description,location_description,domestic"as the input parameter, rather than ['primary_type', 'description'... etc]
I did some experiments with the ARIMA model on 2 datasets
Airline passengers data
USD vs Indian rupee data
I am getting a normal zig-zag prediction on Airline passengers data
ARIMA order=(2,1,2)
Model Results
But on USD vs Indian rupee data, I am getting prediction as a straight line
ARIMA order=(2,1,2)
Model Results
SARIMAX order=(2,1,2), seasonal_order=(0,0,1,30)
Model Results
I tried different parameters but for USD vs Indian rupee data I am always getting a straight line prediction.
One more doubt, I have read that the ARIMA model does not support time series with a seasonal component (for that we have SARIMA). Then why for Airline passengers data ARIMA model is producing predictions with cycle?
Having gone through similar issue recently, I would recommend the following:
Visualize seasonal decomposition of the data to make sure that the seasonality exists in your data. Please make sure that the dataframe has frequency component in it. You can enforce frequency in pandas dataframe with the following :
dh = df.asfreq('W') #for weekly resampled data and fillnas with appropriate method
Here is a sample code to do seasonal decomposition:
import statsmodels.api as sm
decomposition = sm.tsa.seasonal_decompose(dh['value'], model='additive',
extrapolate_trend='freq') #additive or multiplicative is data specific
fig = decomposition.plot()
plt.show()
The plot will show whether seasonality exists in your data. Please feel free to go through this amazing document regarding seasonal decomposition. Decomposition
If you're sure that the seasonal component of the model is 30, then you should be able to get a good result with pmdarima package. The package is extremely effective in finding optimal pdq values for your model. Here is the link to it: pmdarima
example code pmdarima
If you're unsure about seasonality, please consult with a domain expert about the seasonal effects of your data or try experimenting with different seasonal components in your model and estimate the error.
Please make sure that the stationarity of data is checked by Dickey-Fuller test before training the model. pmdarima supports finding d component with the following:
from pmdarima.arima import ndiffs
kpss_diff = ndiffs(dh['value'].values, alpha=0.05, test='kpss', max_d=12)
adf_diff = ndiffs(dh['value'].values, alpha=0.05, test='adf', max_d=12)
n_diffs = max(adf_diff , kpss_diff )
You may also find d with the help of the document I provided here. If the answer isn't helpful, please provide the data source for exchange rate. I will try to explain the process flow with a sample code.
I am able to train a lgmb model using lgb.train and I can do the same with the CV model.
However, I can atleast use the train model for predictions, I am not sure how to understand what the lgb.cv returns.
Check the official documentation here
Specifically, the returned value is the following:
Returns:
eval_hist – Evaluation history. The dictionary has the
following format: {‘metric1-mean’: [values], ‘metric1-stdv’: [values],
‘metric2-mean’: [values], ‘metric2-stdv’: [values], …}.
Return type:
dict
A very similar topic is discussed here: Cross-validation in LightGBM
I use function predict in opencv to classify my gestures.
svm.load("train.xml");
float ret = svm.predict(mat);//mat is my feature vector
I defined 5 labels (1.0,2.0,3.0,4.0,5.0), but in fact the value of ret are (0.521220207,-0.247173533,-0.127723947······)
So I am confused about it. As Opencv official document, the function returns a class label (classification) in my case.
update: I don't still know why to appear this result. But I choose new features to train models and the return value of predict function is what I defined during train phase (e.g. 1 or 2 or 3 or etc).
During the training of an SVM you assign a label to each class of training data.
When you classify a sample the returned result will match up with one of these labels telling you which class the sample is predicted to fall into.
There's some more documentation here which might help:
http://docs.opencv.org/doc/tutorials/ml/introduction_to_svm/introduction_to_svm.html
With Support Vector Machines (SVM) you have a training function and a prediction one. The training function is to train your data and save those informations on an xml file (it facilitates the prediction process in case you use a huge number of training data and you must do the prediction function in another project).
Example : 20 images per class in your case : 20*5=100 training images,each image is associated with a label of its appropriate class and all these informations are stocked in train.xml)
For the prediction function , it tells you what's label to assign to your test image according to your training DATA (the hole work you did in training process). Your prediction results might be good and might be bad , it's all about your training data I think.
If you want try to calculate the error rate for your classifier to see how much it can give good results or bad ones.