I want to create a function that predicts the price for the model.
https://github.com/kalyaannnn/FirstRepository/blob/main/MumbaiHousing.ipynb
Related
how use this code from https://www.statsmodels.org/dev/generated/statsmodels.robust.robust_linear_model.RLM.html for time series data?
I want to estimate AR(1) parameter model with m-estimator
I wanted to build a time series model - ARIMA, LSTM for eg. where I am trying to forecast a single time series column. But I wanted to do it say each customer wise. For instance: customer_name, date, metric is my data. I wanted to forecast what will be the metric for customer1 tomorrow.
The actual question now is: How to include the categorical column customer_name to the model.
I am good building model only with metric column but what I need is the metric for a particular customer.
I have built random forest model and saved the model. There are few numeric variables and few hot coded and converted as factor variables.
I have a situation where some of the records in new data are also part of the train data and the prediction probabilities are differing for the similar records.
Saved model name is Rfmod.
Used the following code run the predictions:
load(Rfmod)
Pred <- predict(Rfmod, newdata, type ='prob')
The probabilities for the common records in both train and new data is not same. Any thoughts on this? I have also tried passing newdata option in predict function, but the difference is still there.
I have a training data set containing College names,student rank, branch, college cutoff. Which prediction model should I use to predict the list of colleges a student will get admission in according to his rank, college cutoff and the branch?
I am new to machine learning.
I expect the output to display a list of colleges in which a student can be admitted instead of displaying if a college is allocated to a student.
Your problem can be treated as a multi class classification problem where every college will become a class. You can use a simple random forest model and predict the class probabilities for every student record. Since you are using probabilities, the model will return the list of college along with the probability. Set a probability threshold and take the college above that threshold as your result.
This is a multiclass classification problem. If you are new, I suggest use tree based models such as random forest classifier (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) or try Xgboost if you are not getting good enough results from random forest. They are easy to use and perform nicely in multi-class classification problems. They will also give you feature importance easily that will help you to explain your model as well.
I am learning about the ARIMA model. My training set consists of 1) a date, 2) about 20 input features for each date, 3) output variable. Do ARIMA models take in as input multiple input features and then predict one of the features? Or do they only operate on a single variable?
ARIMA models are time series models, so they do not allow exogenous variables. There are various extensions of ARIMA models that do include exogenous variables including ARIMAX models, transfer function models, dynamic regression models, etc.