Customer segmentation or Recommender system? - machine-learning

I have a dataset containing multiple tables including customer information, customer transactions and list of rewards campaign for customers. I am trying to figure out the customers to run a campaign to the next month and I am confused as to if this is a Recommender system problem (which I am honestly skeptical about as there are no actual products to draw similarities) or a Customer segmentation issue.

Related

In a data warehouse, should a measure be based on a fact or a dimension?

Let's say there is a data warehouse created from a shop data. A fact is a single purchase of a product. There is a dimension that describes a customer.
There is a need to create a measure that stores a number of distinct customers. Can this measure be created based on customer identifier in the dimension table, or it needs to be fact table? In which cases one or the other solution is better?
Below I post a visualization based on an AdventureWorks2016 database:

Churn Prediction Model for an online fashion company

I have been working on a individual project with an online fashion company dataset. I aim building a churn prediction model. In order to do that I set a churn criteria such that a customer turns out to be churn with 12 months inactivity. But I have a confusion deciding the timeline of the data that I will train my model. Since churn periods are customer specific I cannot set a specific date interval. My dataset is betweem 2015 and March 2018 and I thought that it would be fine to select a sample customer who has a transaction in 2016. Then I took the last available date in dataset which is a someday in March 2018 and look 12 months back to identify who has gone churn. Then I took those customers I select who made a transaction in 2016 and took their all transaction data during the available data (2015-2018). I also added a feature to the model checking if the customer has a transaction within the last 3 months as a binary variable. However, I feel there is a mistake here. I am a self taught individual and I could not find a proper guide to build the model on the internet. Most of the churn prediction models do not talk about the data preparation in detail enough. I hope someone share their valuable ideas with me

Not a single recommendation available with Apache Mahout

I have tested the user based recommendations with apache mahout and it is working well with the sample data provided.
However, I have my own data but I am not able to get a single recommendation. I find out that it is due to the fact that the data are too sparse, but I would appreciate the advice of an expert ;)
It is only using purchase history so I have rated a product to a 4.0 for all user id <-> product id purchase.
Here is the data file : http://we.tl/RcR83vcHQI
Could you give me some advice to start having some useful recommendations ?
Thanking you in advance.
This is a common problem with people new to Mahout. Version 0.9 and before requires your IDs to be sequential contiguous non-negative integers. This includes user and item IDs. They are used in Mahout as the row and column numbers in the matrix of all input.
There are several ways to tackle this like keeping HashBiMaps (Guava collections) for user and item IDs. As you see the first ID assign it a Mahout ID of 0 and store the relationship in the map. Keep looking through your IDs to find the next unique one and assign it Mahout ID = 1, etc.
Then you'll get Mahout IDs back from the recommender. You can use the bidirectional HashBiMap to translate them into your application specific IDs.
BTW Mahout (1.0-snapshot or greater) now has a completely new generation recommender based on using a search engine to serve recommendations and Mahout to calculate the model. It will take the input you have directly - doing the ID translation inside. It has many benefits over the older Hadoop version including:
Multimodal: it can ingest many different user actions on many different item set. This allow you to use much of the user's clickstream to recommend.
Realtime results: it has a very fast scalable server in Solr or Elastic search.
Due to the realtime nature it can recommend to new users or users with very recent history. The older Hadoop Mahout recommenders only recommend to users and items in the training data--they cannot react to history that was not used in training. The new recommender can use realtime gathered data, even on new users.
The new Multimodal Recommender is described here:
Mahout site
A free ebook, which talks about the general idea: Practical Machine Learning
A slide deck, which talks about mixing actions or other indicators: Creating a Unified Multimodal Recommender
Two blog posts: What's New in Recommenders: part #1 and What's New in Recommenders: part #2
A post describing the log likelihood ratio: Surprise and Coincidence LLR is used to reduce noise in the data while keeping the calculations O(n) complexity.

Can we predict the dates where each customers is to make transaction(s)?

I came across a project where we have variables in a data set such as customer ids, dates they purchased the products, type of products they purchased, and product price. I wanted to predict at what date the customer is likely to make a transaction and what product they are likely to purchase. Dates could be in days, weeks, or months.
From my understanding, I think I'll have to split the problem into different models. 1st model predicting the product(s) that EACH customer will purchase. 2nd model predicting the date of the transaction that is likely to occur for EACH customer. Obviously for the first model, we should be using classification machine learning models. I am not sure which model should I be using for the 2nd model. It could be time series, but I have not predicted the dates for a model yet. I hope I am the right track.
Main questions are:
Can we predict the dates from any machine learning techniques in terms of days, weeks, or months?
Can we predict the dates and products that each customer is going to purchase? or do we need to split the problem and perform separate models for it?
Suggestions will be very much appreciated!
Check out the BTYD package:
http://cran.r-project.org/web/packages/BTYD/vignettes/BTYD-walkthrough.pdf
It uses Bayesian models to model customer purchase behaviour - both on the individual customer level and in aggregate. It certainly can solve your problem of "when" customers will buy. Regarding the problem "which products" - I suspect that you could separately model the purchasing process for particular product (or set of products).

Applying AI, recommendation or machine learning techniques to search feature

I'm new to the area of AI, machine learning, recommendation engines and data mining however would like to find a way to get into the area.
I'm working on an conference room booking application which will recommend meeting rooms to employees at which it calculates to be the most suitable time and location. The recommendations are based on criteria which an employee will enter before submitting a search. The criteria can include meeting attendees (which can be in different locations and timezones), room capacity (based on attendees) and types of equipment required.
The recommendation engine will take into consideration timezones and locations and recommend one or more meetings rooms , depending on whether employees are in different builings/geo-graphical regions.
Can anyone recommend recommendation engine, machine learning or AI techniques which i could apply to solving the solution? I'm new to this area so all suggestions are greatly appreciated.
This looks more like an optimization problem. You have some hard constraints and some preferences. Look at Linear Programming. Also google Constraint based Scheduling, there are several tutorials.
Just a warning: This is in general an NP-hard problem, so unless you are trying to solve it for a small number participants, you will need to use some heuristics and approximations. If you want to go a little bit overboard, there is a coursera class on optimization running right now.

Resources