I have a question about optimizing the message (email/push/text etc.) sending time to our subscribers. The desired output will be a time interval of each day for each person.
We have the history of the time when a person opened/clicked our message, their demographic information and some other browsing history. But I am not sure if this could be a machine learning model since each individual behaved so differently and I don't have many good predictors.
Should I just summarize the best reaching time for them in the historical data, or it could be a machine learning model?
Related
When using AWS Forecast, is there some way to augment our model with "partial future information" in order to improve forecasts?
I have been getting quite solid looking predictions from AWS Forecast so far, but suspect that I could improve the predictions somewhat substantially if I could provide some information about known future events.
I'm very new to forecasting and machine learning and by "partial future information", I mean:
I am trying to predict how the time-series of variable X will behave in the future
I am training a model with past time-series information for many different variables, including X
I would like to also provide known future time-series information for a subset of these variables because 1) they should have a significant impact on predictions and 2) this would give me the ability to perform "what-if" analysis
To be more concrete:
I am trying to predict future revenue from past revenue, web traffic volume, advertising spending, and promotional discounts
AWS Forecast has been providing me with good forecasts so far (I hold back so many months of known data from the model and its predictions about the "future" match the known data quite well)
However, I would really like to also tell AWS Forecast about, for example, a significant advertising campaign that is planned for the near future
I would also really like to be able to vary some future variable or variables and see how they affect the outcome ("what if I spend $Z on advertising next month?")
Currently, I am providing all of our past revenue, web traffic volume, advertising spending, and promotional discount information to AWS Forecast as a "Target Time Series" in the format of a single CSV file with 3 columns (metric name, timestamp, metric value); approximately 15 distinct values of metric name; and about 10,000 total rows of data (several years worth of daily values of 15 variables = ~ 2 * 365 * 15 = ~ 11,000 rows). Every metric is provided over the same time interval (for instance, all of the metrics are provided between 2017-10-01 and 2019-11-25).
I'd like to provide some additional, partial data that highlights known future significant events (spending on advertising, promotional discounts) to improve our predictions even further.
For example:
Revenue from 2017-10-01 to 2019-11-25
Web traffic from 2017-10-01 to 2019-11-25
Ad spend from 2017-10-01 to 2019-11-25
Promotional discounts from 2017-10-01 to 2019-11-25
plus planned ad spend for 2019-11-26 to 2020-02-01
Can someone please help me with some of the terminology and the "how-to" mechanics of this?
In general, to use a variable in your historical data, you need a forecast of it in the future as well. It would be like trying to forecast electrical usage and then putting historical temperatures in the data set. If you don't have a forecast of the future temperatures, that information hasn't done you any good in improving your forecast. Because now I know what the effect of "an extra one degree of temperature on electrical usage", but ¿what do I do with that if I have no idea what the temperature will be tomorrow?
In your case you have 1 metric you want to forecast (revenue) and three supporting pieces of data: traffic, ad spend, discount. It's great that you have future ad spend, but without the other two, you're a bit out of luck (per the prior paragraph).
However, you can still do something here, but you'll just have to make some assumptions. What I would do is choose a fixed value for all dates in the future and set that for all future dates. Perhaps appropriate values would be discount at zero (full price item) and web traffic at—I'm making this up—1K per day. Now you have full data sets for past and future.
With that set up you could now answer the question, albeit with a caveat. The forecast you get out is now saying...
Here's how much revenue we can expect given our planned ad spend, if we offer no discounts and we get 1K people to the website every day.
Perhaps you could improve that by inputting traffic values in the future that are the same from a year prior. In which case, you could now say ...
Here's how much revenue we can expect given our planned ad spend, if we offer no discounts and the website gets the same traffic as this time last year.
You can take that to variations such as "traffic goes up 10%" or you can take a guess at what the discounts will be or, like before, you could replicate your discounts and traffic from a year prior and say...
Here's how much revenue we can expect given our planned ad spend, if we offer discounts just like last year and see website traffic just like last year.
I suspect you get the idea, so I'll stop all the variations. These are, of course, really just future forecasts of those data; however, it's worth nothing that "creating a forecast" of discounts or web-traffic, doesn't have to be complicated and fancy. "The same as last year" is a perfectly valid "forecast" of what's to come.
I'm trying to build a model which will give the probability of every customer in a database will show up on a certain day (i.e. I pass in 8/25/19 and the list of all customers shows up with their respective probability). I have the logs for all customers transactions and the date. I'm thinking of using some sort of RNN to do this. Is this the proper way to do this? If not, what is the best way to do it? I want to discover the patterns and high confidence leads for which customers show up. There is around 400,000 records for 3 years.
You have time series data.
RNN is a good starting point. Check out this step-by-step instructions of sales prediction. RNN is an easy start and might give you really good quality. Also there is an adaptation of xgboost algorithm for time series that also gives a good quality, but might be slower.
Good luck!
I am looking for solutions where I can automatically approve or disapprove different supplier invoices based on historical data.
Let's say, I got an invoice from an HP laptop supplier and based on the previous data, I have to approve or reject that invoice.
Basically, I want to make a decision or prediction based on the data already available based on the history with artificial intelligence, machine learning or any other cloud service
This isn't a direct question though but you can start by looking into various methods of classifications. There is a huge amount of material available online. Try reading about K-Nearest Neighbors, Naive Bayes, K-means, etc. to get an idea about how algorithms in Machine Learning domain work. Once you start understanding what is written in the documentation then start implementing them. You will face a lot of problems which you can search online and I'm sure you will find most of them answered here in this portal.
I am collecting the activities of a person in a day with time stamp data .Assume I am tracking 4 different activities that person is doing and an event occurring time in that day.event can occur multiple times also in a day. I am trying to predict the event occurring time in a day using the historical data to train a model.
My model should give an out put as a time having the maximum probability of that event to happen.
please suggest what should be the machine learning approach to this problem.
Thanks in advance for the help on this.
If you have a background in machine learning/statistics then You going to achieve your project within minimal time.
Here is a glimpse for you.
Machine learning has various algorithms to implement, depending on the kind of the problem you're solving.so,in this case your problem can be solved by predictive analysis model which is based on predicting time driven events effectively.And under the hood,you'd apply nor use regression algorithm.(linear/logistic)
It uses historical data to predict future events and still the historical data can be used to generate mathematical model which you can as well use to capture important events or trends.Then for predictive model,you can use it on current data to predict what time an event will happen.
For your information,there are software packages/libraries which can as well help you to implement the above algorithm effectively.
Hope this helps.
I am working on a personal project in which I log data of the bike rental service my city has in a MySQL database. A script runs every thirty minutes and logs data for every bike station and the free bikes each one has. Then, in my database I average the availability of each station for each day at that given time making it, as today, an approximate prediction with 2 months of data logging.
I've read a bit on machine learning and I'd like to learn a bit. Would it be possible to train a model with my data and make better predictions with ML in the future?
The answer is very likely yes.
The first step is to have some data, and it sounds like you do. You have a response (free bikes) and some features on which it varies (time, location). You have already applied a basic conditional means model by averaging values over factors.
You might augment the data you know about locations with some calendar events like holiday or local event flags.
Prepare a data set with one row per observation, and benchmark the accuracy of your current forecasting process for a period of time on a metric like Mean Absolute Percentage Error (MAPE). Ensure your predictions (averages) for the validation period do not include any of the data within the validation period!
Use the data for this period to validate other models you try.
Split off part of the remaining data into a test set, and use the rest for training. If you have a lot of data, then a common training/test split is 70/30. If the data is small you might go down to 90/10.
Learn one or more machine learning models on the training set, checking performance periodically on the test set to ensure generalization performance is still increasing. Many training algorithm implementations will manage this for you, and stop automatically when test performance starts to decrease due to overfitting. This a big benefit of machine learning over your current straight average, the ability to learn what generalizes and throw away what does not.
Validate each model by predicting over the validation set, computing the MAPE and compare the MAPE of the model to that of your original process on the same period. Good luck, and enjoy getting to know machine learning!