Machine Learning: How to learn MTG draft card game - machine-learning

I have become familiar with many various approaches to machine learning, but I am having trouble identifying which approach might be most appropriate for my given fun problem. (IE: is this a supervised learning problem and if so, what is my input x and output y?)
A magic the gathering draft consists of many actors sitting around a table holding a pack of 15 cards. the players pick a card and pass the remainder of the pack to the player next to them. They open a new pack and do this again for 3 total rounds (45 decisions). People end up with a deck which they use to compete.
I am having trouble understanding how to structure the data I have into trials which are needed for learning. I want a solution that 1) builds knowledge about which cards are picked relative to the previous cards that are picked 2) can then be used to make a decision about which card to pick from a given new pack.
I've got a dataset of human choices I'd like to learn from. It also includes info on cards they ended up picking but discarding ultimately. What might be an elegant way to structure this data for learning, aka, what are my features?

These kind of problems are usually tackled by reinforcment learning, planning and markov decision processes. Thus this is not oa typical scheme of supervised/unsupervised learning. This is rather about learning to play something - to interact with the environment (rules of the game, chances etc.). Take a look at methods like:
Q-learning
SARSA
UCT
In particular, great book by Sutton and Barto "Reinforcement Learning: An Introduction" can be a good starting point.

Yes, you can train a model do handle this -- eventually -- with either supervised or unsupervised learning. The problem is the quantity of factors and local stable points. Unfortunately, the input at this point is the state of the game: cards picked by all players (especially the current state of the AI's deck) and those available from the deck in hand.
Your output result should be, as you say, the card chosen ... out of those available. I feel that you have a combinatorial explosion that will require either massive amounts of data, or simplification of the card features to allow the algorithm to extract a meaning deeper than "Choose card X out of this set of 8."
In practice, you may want the model to score the available choices, rather than simply picking a particular card. Return rankings, or fitness metrics, or probabilities of picking each particular card.
You can supply some supervision in choice of input organization. For instance, you can provide each card as a set of characteristics, rather than simply a card ID -- give the algorithm a chance to "understand" building a consistent deck type.
You might also wish to put in some work to abstract (i.e. simplify) the current game state, such as writing evaluation routines to summarize the other decks being built. For instance, if there are 6 players in the group, and your RHO and his opposite are both building burn decks, you don't want to do the same -- RHO will take the best burn cards in 5 of 6 decks passed around, leaving you with 2nd (or 3rd) choice.
As for the algorithm ...
A neural network will explode with this many input variables. You'll want something simpler that matches your input data. If you go with abstracted properties, you might consider some decision-tree algorithm (Naive Bayes, Random Forest, etc.). You might also go for a collaborative filtering model, given the similarities in situations.
I hope this helps launch you toward designing your features. Do note that you're attacking a complex problem: one of the features that can make a game popular for humans is that it's hard to automate the decision-making.

Every single "pick" is a decision, with the input information as A:(what you already have), and B:(what the available choices are).
Thus, a machine that decides "whether you should pick this card", can be a simple binary classifier given the input of A+B+(the card in question).
For example, the pack 1 pick 2 of someone basically provides 1 "yes" (the card picked) and 13 "no" (the cards not picked), total 14 rows of training data.
We may want to weight these training data according to which pick it is. (When there are less cards left, the choice might be less important than when there are more choices.)
We may also want to weight these training data according to the rarity of cards.
Finally, the main challenge is that the input data (the features), A+B+card, is inappropriate unless we do a smart transformation. (Simply treating the card as a categorical variable and 1-hot coding them leads to something that is too big and very low information density. That will definitely fail.)
This challenge can be resolved by making it a 2-stage process. First we vectorize the cards, then build features based on the vectors.
http://www.cs.toronto.edu/~mvolkovs/recsys2018_challenge.pdf

Related

Is reinforcement learning suitable for predicting bias in dice?

I would like to analyze a problem similar to the following.
Problem:
You will be given N dices.
You will be given a lot of data about each dice (eg surface information, material information, location of the center of gravity … etc).
The features of the dice are randomly generated every game and are fired at the same speed, angle and initial position.
As a result of rolling the dice, you get 1 point if you get 6 and 0 points otherwise.
There are training data of 100000 games. (Dice data and match results)
I would like to learn the rule of selecting only dice whose probability of getting 6 is higher than 1/6.
I apologize for the vague problem statement.
First of all, it is my mistake to assume that "N dice".
The dice may be one by one.
One dice with random characteristics are distributed
When it rolls, it is recorded whether 6 has come out or not.
It was easy to understand if it was made into the problem that "this [characteristics, result] data is 100,000".
If you get something other than 6, you will get -1 points.
If you get 6, you will get +5 points.
Example:
X: vector of a dice data
f: function I want to know
f: X-> [0, 1]
(if result> 0.5, I pick this dice.)
For example, a dice with a 1/5 chance of getting a 6 gets 4 out of 5 times a non-6, so I wondered if it would be better to give an immediate reward.
Is it good to decide the reward by the number of points after 100000 games?
I have read some general reinforcement learning methods, but there is a concept of state transition. However, there is no state transition in this game. (Each game ends in 1 step, and each game is independent.)
I am a student just learning neural networks from scratch. It helps if you give me a hint. Thank you.
by the way,
I think that the result of this learning can be concluded "It is good to choose the dice whose pips farthest to the center of gravity is 6."
Let's first talk about Reinforcement-Learning.
Problem setups, in order of increasing generality:
Multi-Armed Bandit - no state, just actions with unknown rewards
Contextual Bandit - rewards also depend on some context (state)
Reinforcement Learning (MDP) - actions can also influence the next state
Common to all of all three is that you want to maximize the sum of rewards over time, and there is an exploration vs exploitation trade-off. You are not just given a large dataset. If you want to know what the best action is, you have to try it a few times and observe the reward. This may cost you some reward you could have earned otherwise.
Of those three, the Contextual Bandit is the closest match to your setup, although it doesn't quite match to your goals. It would go like this: Given some properties of the dice (the context), select the best dice from a group of possible choices (the action, e.g. the output from your network), such that you get the highest expected reward. At the same time you are also training your network, so you have to pick bad or unknown properties sometimes to explore them.
However, there are two reasons why it doesn't match:
You already have data from several 100000 of games, and seem to be not interested in minimizing the cost of trial and error to acquire more data. You assume this data is representative, so no exploration is required.
You are only interested in prediction. You want classify the dice into "good for rolling 6" vs "bad". This piece of information could be used later to make a decision between different choices if you know the cost for making a wrong decision. If you are just learning f() because you are curious about the property of a dice, then is a pure statistical prediction problem. You don't have to worry about short- or long-term rewards. You don't have to worry about selection or consequences of any actions.
Because of this, you actually only have a supervised learning problem. You could still solve it with reinforcement learning because RL is more general. But your RL algorithm would be wasting a lot of time figuring out that it really cannot influence the next state.
Supervised Learning
Your dice actually behaves like a biased coin, it's a Bernoulli trial with ~1/6 success probability. This is now a standard classification problem: given your features, predict the probability that a dice will lead to a good match result.
It seems that your "match results" can be easily converted in the number of rolls and the number of positive outcomes (rolled a six) with the same dice. If you have a large number of rolls for every dice, you can simply classify this die and use this class (together with the physical properties) as one data point to train your network.
You can do more fancy things if you have fewer rolls but I won't go into that. (If you are interested, have a look at the beta distribution and how the cross-entropy loss works with neural networks.)

Unsupervised Learning

I am working on final year project which has to be coded using unsupervised learning (KMeans Algorithm). It is to predict a suitable game from various games regarding their cognitive skills levels. The skills are concentration, Response time, memorizing and attention.
The first problem is I cannot find a proper dataset that contains the skills and games. Then I am not sure about how to find out clusters. Is there any possible ways to find out a proper dataset and how to cluster them?
Furthermore, how can I do it without a dataset (Without using reinforcement learning)?
Thanks in advance
First of all, I am kind of confused with your question. But I will try to answer with the best of my abilities. K-means clustering is an unsupervised clustering method based on the distance (typically Euclidean) of data from each other. Data points with similar features will have a closer distance, and will then be clustered into the same cluster.
I assume you are trying build an algorithm that outputs a recommended game, given an individuals concentration, response time, memorization, and attention skills.
The first problem is I cannot find a proper dataset that contains the skills and games.
For the data set, you can literally build your own that looks like this:
labels = [game]
features = [concentration, response time, memorization, attention]
Labels is a n by 1 vector, where n is the number of games. Features is a n by 4 vector, and each skill can have a range of 1 - 5, 5 being the highest. Then populate it with your favorite classic games.
For example, Tetris can be your first game, and you add it to your data set like this:
label = [Tetris]
features = [5, 2, 1, 4]
You need a lot of concentration and attention in tetris, but you don't need good response time because the blocks are slow and you don't need to memorize anything.
Then I am not sure about how to find out clusters.
You first have to determine which distance you want to use, e.g. Manhattan, Euclidean, etc. Then you need to decide on the number of clusters. The k-means algorithm is very simple, just watch the following video to learn it: https://www.youtube.com/watch?v=_aWzGGNrcic
Furthermore, how can I do it without a dataset (Without using reinforcement learning)?
This question makes 0 sense because first of all, if you have no data, how can you cluster them? Imagine your friends asking you to separate all the green apples and red apples apart. But they never gave you any apples... How can you possibly cluster them? It is impossible.
Second, I'm not sure what you mean by reinforcement learning in this case. Reinforcement learning is about an agent existing in an environment, and learning how to behave optimally in this environment to maximize its internal reward. For example, a human going into a casino and trying to make the most money. It has nothing to do with data sets.

Supervised learning linear regression

I am confused about how linear regression works in supervised learning. Now I want to generate a evaluation function for a board game using linear regression, so I need both the input data and output data. Input data is my board condition, and I need the corresponding value for this condition, right? But how can I get this expected value? Do I need to write an evaluation function first by myself? But I thought I need to generate an evluation function by using linear regression, so I'm a little confused about this.
It's supervised-learning after all, meaning: you will need input and output.
Now the question is: how to obtain these? And this is not trivial!
Candidates are:
historical-data (e.g. online-play history)
some form or self-play / reinforcement-learning (more complex)
But then a new problem arises: which output is available and what kind of input will you use.
If there would be some a-priori implemented AI, you could just take the scores of this one. But with historical-data for example you only got -1,0,1 (A wins, draw, B wins) which makes learning harder (and this touches the Credit Assignment problem: there might be one play which made someone lose; it's hard to understand which of 30 moves lead to the result of 1). This is also related to the input. Take chess for example and take a random position from some online game: there is the possibility that this position is unique over 10 million games (or at least not happening often) which conflicts with the expected performance of your approach. I assumed here, that the input is the full board-position. This changes for other inputs, e.g. chess-material, where the input is just a histogram of pieces (3 of these, 2 of these). Now there are much less unique inputs and learning will be easier.
Long story short: it's a complex task with a lot of different approaches and most of this is somewhat bound by your exact task! A linear evaluation-function is not super-uncommon in reinforcement-learning approaches. You might want to read some literature on these (this function is a core-component: e.g. table-lookup vs. linear-regression vs. neural-network to approximate the value- or policy-function).
I might add, that your task indicates the self-learning approach to AI, which is very hard and it's a topic which somewhat gained additional (there was success before: see Backgammon AI) popularity in the last years. But all of these approaches are highly complex and a good understanding of RL and the mathematical-basics like Markov-Decision-Processes are important then.
For more classic hand-made evaluation-function based AIs, a lot of people used an additional regressor for tuning / weighting already implemented components. Some overview at chessprogramming wiki. (the chess-material example from above might be a good one: assumption is: more pieces better than less; but it's hard to give them values)

Using decision tree in Recommender Systems

I have a decision tree that is trained on the columns (Age, Sex, Time, Day, Views,Clicks) which gets classified into two classes - Yes or No - which represents buying decision for an item X.
Using these values,
I'm trying to predict the probability of 1000 samples(customers) which look like ('12','Male','9:30','Monday','10','3'),
('50','Female','10:40','Sunday','50','6')
........
I want to get the individual probability or a score which will help me recognize which customers are most likely to buy the item X. So i want to be able to sort the predictions and show a particular item to only 5 customers who will want to buy the item X.
How can I achieve this ?
Will a decision tree serve the purpose?
Is there any other method?
I'm new to ML so please forgive me for any vocabulary errors.
Using decision tree with a small sample set, you will definitely run into overfitting problem. Specially at the lower levels of the decision, where tree you will have exponentially less data to train your decision boundaries. Your data set should have a lot more samples than the number of categories, and enough samples for each categories.
Speaking of decision boundaries, make sure you understand how you are handling data type for each dimension. For example, 'sex' is a categorical data, where 'age', 'time of day', etc. are real valued inputs (discrete/continuous). So, different part of your tree will need different formulation. Otherwise, your model might end up handling 9:30, 9:31, 9:32... as separate classes.
Try some other algorithms, starting with simple ones like k-nearest neighbour (KNN). Have a validation set to test each algorithm. Use Matlab (or similar software) where you can use libraries to quickly try different methods and see which one works best. There is not enough information here to recommend you something very specific. Plus,
I suggest you try KNN too. KNN is able to capture affinity in data. Say, a product X is bought by people around age 20, during evenings, after about 5 clicks on the product page. KNN will be able to tell you how close each new customer is to the customers who bought the item. Based on this you can just pick the top 5. Very easy to implement and works great as a benchmark for more complex methods.
(Assuming views and clicks means the number of clicks and views by each customer for product X)
A decision tree is a classifier, and in general it is not suitable as a basis for a recommender system. But, given that you are only predicting the likelihood of buying one item, not tens of thousands, it kind of makes sense to use a classifier.
You simply score all of your customers and retain the 5 whose probability of buying X is highest, yes. Is there any more to the question?

Machine learning how to use the Facebook interest of a users to give a decision

I'm trying to figure out a way I could represent a Facebook user as a vector. I decided to go with stacking the different attributes/parameters of the user into one big vector (i.e. age is a vector of size 100, where 100 is the maximum age you can have, if you are lets say 50, the first 50 values of the vector would be 1 just like a thermometer). I just can't figure out a way to represent the Facebook interests as a vector too, they are a collection of words and the space that represents all the words is huge, I can't go for a model like a bag of words or something similar. Does anyone know how I should proceed? I'm still new to this, any reference would be highly appreciated.
In the case of a desire to down vote this question just let me know what is wrong about it so that I could improve the wording and context.
Thanks
The "right" approach depends on what your learning algorithm is and what the decision problem is.
It would often be better, though, to represent age as a single numeric feature rather than 100 indicator features. That way learning algorithms don't have to learn the relationship between those hundred features (it's baked-in), and the problem has 99 fewer dimensions, which'll make everything better.
To model the interests, you might want to start with an extremely high-dimensional bag of words model and then use one of various options to reduce the dimensionality:
a general dimensionality-reduction technique like PCA or smarter nonlinear ones, including Kernel PCA or various nonlinear approaches: see wikipedia's overview of dimensionality reduction and of specifically nonlinear techniques
pass it through a topic model and use the learned topic weights as your features; examples include LSA, LDA, HDP and many more

Resources