How can I predict the future data by past data? - machine-learning

If I have past data of employees for several years and want to predict the future data depending on the their behavior of taking their vacations, which algorithm or model is best for that? Any recommnedations or suggestions?
For example:
I took 10 days off in June 2019 and 15 days off in December 2019 BUT I took 15 days off in June 2018 and 15 days off in December 2018, so what is the prediction of June and December of 2020? Obviously this is not enough to predict that but it's jsut for an idea to udnerstand the problem.
Any ideas about it?

Try with these tutorials RNN or LSTM algorithms

Checkout Time Series
Few helpful links:
End to end time series project
Seven day mini course

Related

How to use logistic regression on time series data?

My question is more theoretical and hope someone can help me.
I have socio-demographic data for 2007, 2010, 2011 and 2020 years from the statistical office. All of the variables are categorical, while the Dependent variable is binary. I want to explain dependence and think about using logistic regression.
However I have some doubts, should I treat this data as time series or I can just use it as usual?

Churn Prediction Model for an online fashion company

I have been working on a individual project with an online fashion company dataset. I aim building a churn prediction model. In order to do that I set a churn criteria such that a customer turns out to be churn with 12 months inactivity. But I have a confusion deciding the timeline of the data that I will train my model. Since churn periods are customer specific I cannot set a specific date interval. My dataset is betweem 2015 and March 2018 and I thought that it would be fine to select a sample customer who has a transaction in 2016. Then I took the last available date in dataset which is a someday in March 2018 and look 12 months back to identify who has gone churn. Then I took those customers I select who made a transaction in 2016 and took their all transaction data during the available data (2015-2018). I also added a feature to the model checking if the customer has a transaction within the last 3 months as a binary variable. However, I feel there is a mistake here. I am a self taught individual and I could not find a proper guide to build the model on the internet. Most of the churn prediction models do not talk about the data preparation in detail enough. I hope someone share their valuable ideas with me

Does addition on Date column leads to over fitting?

I'm working on a dataset and what to predict whether it will rain or not, so should I include the date column. I haven't built the model yet, but I think it will lead to overfitting.
I don't think datetime is a vital feature. Though useful feature could be the season but now-a-days it's changing rapidly due to climate change and so on.
Anyways as it's a time-series problem the results are much more dependent on the condition of prior days but of course there are subtle changes which makes it harder to predict.
There are some existing works you can find below:
https://pdfs.semanticscholar.org/2761/8afb77c5081d942640333528943149a66edd.pdf
(used 2 prior days info as features)
https://stackabuse.com/using-machine-learning-to-predict-the-weather-part-1/ (3 prior days info as features)
I think these are some good starting point.

Machine learning project - my target variable is not evenly distributed in time

I´m working on a machine learning project where I try to predict, what clients will buy a specific product (buying the product is my target variable). I have plenty of features about the clients and enough historical data.
My issue is that my target variable is highly seasonal – most of the product is sold in December, other months have only little sales.
What do I have to do, to compensate this imbalance? Does the target variable need some adjustments? I need the model to have consistent performance across all months. Thanks
The simplest option would be to include month as a feature in some way. Some options for doing that:
One-hot-encode month. Pros: very simple, leads to rather sparse features set
Create a naive-bayes type feature encoding the prior probability of a sale in the given month. e.g. if 60% of sales are in December and uniformly 3.6% of sales in every other month, then this feature would have a value of 0.6 for every sale in December and 0.036 for sales in other months
For both of these methods you would want to ensure you have training data from a full 12 month period and separate evaluation set also from a full 12 month period.

What kind of classifier is used in the following scenario?

If I am building a weather predictor that will predict if it is will snow tomorrow, it is very easy to just straight away answer by saying "NO".
Obviously, if you evaluate such a classifier on every day of the year, it would be correct with an accuracy at 95% (considering that I build it and test it in a region where it snows very rarely).
Of course, that is such a stupid classifier even if it has an accuracy of 95% because it is obviously more important to predict if it will snow during the winter months (Jan & Feb) as opposed to any other months.
So, if I have a lot of features that I collect about the previous day to predict if it will snow the next day or not, considering that there will be a feature that says which month/week of the year it is, how can I weigh this particular feature and design the classifier to solve this practical problem?
Of course, that is such a stupid classifier even if it has an accuracy of 95% because it is obviously more important to predict if it will snow during the winter months (Jan & Feb) as opposed to any other months.
Accuracy might not be the best measurement to use in your case. Consider using precision, recall and F1 score.
how can I weigh this particular feature and design the classifier to solve this practical problem?
I don't think you should weight any particular feature in any way. You should let your algorithm do that and use cross validation to decide on the best parameters for your model, in order to also avoid overfitting.
If you say jan and feb are the most important months, consider only applying your model for those two months. If that's not possible, look into giving different weights to your classes (going to rain / not going to rain), based on their number. This question discusses that issue - the concept should be understandable regardless of your language of choice.

Resources