Is there a mapping between direct/indirect and supervised/unsupervised/reinforcement learning? To me it looks like direct learning ≈ supervised learning and indirect learning ≈ reinforcement learning, but I couldn't find a good reference for this.
Both direct and indirect learning can be unsupervised (that's how I usually see them), drawing conclusions from existing data alone.
Direct learning refers to hard facts logically implied by the data. For instance, given a data base on international football competitions, you could inquire who has scored the most international goals, lifetime (Abby Wambach; Ali Daei on the men's side).
Indirect learning refers to inferences drawn from the data. For instance, given a data base of movie reviews, you could identify clusters of users who rate action movies similarly, and use those correlations to predict how one member might like a particular movie he had not yet seen, but others have rated.
Related
I am working on final year project which has to be coded using unsupervised learning (KMeans Algorithm). It is to predict a suitable game from various games regarding their cognitive skills levels. The skills are concentration, Response time, memorizing and attention.
The first problem is I cannot find a proper dataset that contains the skills and games. Then I am not sure about how to find out clusters. Is there any possible ways to find out a proper dataset and how to cluster them?
Furthermore, how can I do it without a dataset (Without using reinforcement learning)?
Thanks in advance
First of all, I am kind of confused with your question. But I will try to answer with the best of my abilities. K-means clustering is an unsupervised clustering method based on the distance (typically Euclidean) of data from each other. Data points with similar features will have a closer distance, and will then be clustered into the same cluster.
I assume you are trying build an algorithm that outputs a recommended game, given an individuals concentration, response time, memorization, and attention skills.
The first problem is I cannot find a proper dataset that contains the skills and games.
For the data set, you can literally build your own that looks like this:
labels = [game]
features = [concentration, response time, memorization, attention]
Labels is a n by 1 vector, where n is the number of games. Features is a n by 4 vector, and each skill can have a range of 1 - 5, 5 being the highest. Then populate it with your favorite classic games.
For example, Tetris can be your first game, and you add it to your data set like this:
label = [Tetris]
features = [5, 2, 1, 4]
You need a lot of concentration and attention in tetris, but you don't need good response time because the blocks are slow and you don't need to memorize anything.
Then I am not sure about how to find out clusters.
You first have to determine which distance you want to use, e.g. Manhattan, Euclidean, etc. Then you need to decide on the number of clusters. The k-means algorithm is very simple, just watch the following video to learn it: https://www.youtube.com/watch?v=_aWzGGNrcic
Furthermore, how can I do it without a dataset (Without using reinforcement learning)?
This question makes 0 sense because first of all, if you have no data, how can you cluster them? Imagine your friends asking you to separate all the green apples and red apples apart. But they never gave you any apples... How can you possibly cluster them? It is impossible.
Second, I'm not sure what you mean by reinforcement learning in this case. Reinforcement learning is about an agent existing in an environment, and learning how to behave optimally in this environment to maximize its internal reward. For example, a human going into a casino and trying to make the most money. It has nothing to do with data sets.
I have become familiar with many various approaches to machine learning, but I am having trouble identifying which approach might be most appropriate for my given fun problem. (IE: is this a supervised learning problem and if so, what is my input x and output y?)
A magic the gathering draft consists of many actors sitting around a table holding a pack of 15 cards. the players pick a card and pass the remainder of the pack to the player next to them. They open a new pack and do this again for 3 total rounds (45 decisions). People end up with a deck which they use to compete.
I am having trouble understanding how to structure the data I have into trials which are needed for learning. I want a solution that 1) builds knowledge about which cards are picked relative to the previous cards that are picked 2) can then be used to make a decision about which card to pick from a given new pack.
I've got a dataset of human choices I'd like to learn from. It also includes info on cards they ended up picking but discarding ultimately. What might be an elegant way to structure this data for learning, aka, what are my features?
These kind of problems are usually tackled by reinforcment learning, planning and markov decision processes. Thus this is not oa typical scheme of supervised/unsupervised learning. This is rather about learning to play something - to interact with the environment (rules of the game, chances etc.). Take a look at methods like:
Q-learning
SARSA
UCT
In particular, great book by Sutton and Barto "Reinforcement Learning: An Introduction" can be a good starting point.
Yes, you can train a model do handle this -- eventually -- with either supervised or unsupervised learning. The problem is the quantity of factors and local stable points. Unfortunately, the input at this point is the state of the game: cards picked by all players (especially the current state of the AI's deck) and those available from the deck in hand.
Your output result should be, as you say, the card chosen ... out of those available. I feel that you have a combinatorial explosion that will require either massive amounts of data, or simplification of the card features to allow the algorithm to extract a meaning deeper than "Choose card X out of this set of 8."
In practice, you may want the model to score the available choices, rather than simply picking a particular card. Return rankings, or fitness metrics, or probabilities of picking each particular card.
You can supply some supervision in choice of input organization. For instance, you can provide each card as a set of characteristics, rather than simply a card ID -- give the algorithm a chance to "understand" building a consistent deck type.
You might also wish to put in some work to abstract (i.e. simplify) the current game state, such as writing evaluation routines to summarize the other decks being built. For instance, if there are 6 players in the group, and your RHO and his opposite are both building burn decks, you don't want to do the same -- RHO will take the best burn cards in 5 of 6 decks passed around, leaving you with 2nd (or 3rd) choice.
As for the algorithm ...
A neural network will explode with this many input variables. You'll want something simpler that matches your input data. If you go with abstracted properties, you might consider some decision-tree algorithm (Naive Bayes, Random Forest, etc.). You might also go for a collaborative filtering model, given the similarities in situations.
I hope this helps launch you toward designing your features. Do note that you're attacking a complex problem: one of the features that can make a game popular for humans is that it's hard to automate the decision-making.
Every single "pick" is a decision, with the input information as A:(what you already have), and B:(what the available choices are).
Thus, a machine that decides "whether you should pick this card", can be a simple binary classifier given the input of A+B+(the card in question).
For example, the pack 1 pick 2 of someone basically provides 1 "yes" (the card picked) and 13 "no" (the cards not picked), total 14 rows of training data.
We may want to weight these training data according to which pick it is. (When there are less cards left, the choice might be less important than when there are more choices.)
We may also want to weight these training data according to the rarity of cards.
Finally, the main challenge is that the input data (the features), A+B+card, is inappropriate unless we do a smart transformation. (Simply treating the card as a categorical variable and 1-hot coding them leads to something that is too big and very low information density. That will definitely fail.)
This challenge can be resolved by making it a 2-stage process. First we vectorize the cards, then build features based on the vectors.
http://www.cs.toronto.edu/~mvolkovs/recsys2018_challenge.pdf
Objective
To clarify by having what traits or attributes, I can say an analysis is inferential or predictive.
Background
Taking a data science course which touches on analyses of Inferential and Predictive. The explanations (what I understood) are
Inferential
Induct a hypothesis from a small samples in a population, and see it is true in larger/entire population.
It seems to me it is generalisation. I think induct smoking causes lung cancer or CO2 causes global warming are inferential analyses.
Predictive
Induct a statement of what can happen by measuring variables of an object.
I think, identify what traits, behaviour, remarks people react favourably and make a presidential candidate popular enough to be the president is a predictive analysis (this is touched in the course as well).
Question
I am bit confused with the two as it looks to me there is a grey area or overlap.
Bayesian Inference is "inference" but I think it is used for prediction such as in a spam filter or fraudulent financial transaction identification. For instance, a bank may use previous observations on variables (such as IP address, originator country, beneficiary account type, etc) and predict if a transaction is fraudulent.
I suppose the theory of relativity is an inferential analysis that inducted a theory/hypothesis from observations and thought experimentations, but it also predicted light direction would be bent.
Kindly help me to understand what are Must Have attributes to categorise an analysis as inferential or predictive.
"What is the question?" by Jeffery T. Leek, Roger D. Peng has a nice description of the various types of analysis that go into a typical data science workflow. To address your question specifically:
An inferential data analysis quantifies whether an observed pattern
will likely hold beyond the data set in hand. This is the most common
statistical analysis in the formal scientific literature. An example
is a study of whether air pollution correlates with life expectancy at
the state level in the United States (9). In nonrandomized
experiments, it is usually only possible to determine the existence of
a relationship between two measurements, but not the underlying
mechanism or the reason for it.
Going beyond an inferential data analysis, which quantifies the
relationships at population scale, a predictive data analysis uses a
subset of measurements (the features) to predict another measurement
(the outcome) on a single person or unit. Web sites like
FiveThirtyEight.com use polling data to predict how people will vote
in an election. Predictive data analyses only show that you can
predict one measurement from another; they do not necessarily explain
why that choice of prediction works.
There is some gray area between the two but we can still make distinctions.
Inferential statistics is when you are trying to understand what causes a certain outcome. In such analyses there is a specific focus on the independent variables and you want to make sure you have an interpretable model. For instance, your example on a study to examine whether smoking causes lung cancer is inferential. Here you are trying to closely examine the factors that lead to lung cancer, and smoking happens to be one of them.
In predictive analytics you are more interested in using a certain dataset to help you predict future variation in the values of the outcome variable. Here you can make your model as complex as possible to the point that it is not interpretable as long as it gets the job done. A more simplified example is a real estate investment company interested in determining which combination of variables predicts prime price for a certain property so it can acquire them for profit. The potential predictors could be neighborhood income, crime, educational status, distance to a beach, and racial makeup. The primary aim here is to obtain an optimal combination of these variables that provide a better prediction of future house prices.
Here is where it gets murky. Let's say you conduct a study on middle aged men to determine the risks of heart disease. To do this you measure weight, height, race, income, marital status, cholestrol, education, and a potential serum chemical called "mx34" (just making this up) among others. Let's say you find that the chemical is indeed a good risk factor for heart disease. You have now achieved your inferential objective. However, you are satisfied with your new findings and you start to wonder whether you can use these variables to predict who is likely to get heart disease. You want to do this so that you can recommend preventive steps to prevent future heart disease.
The same academic paper I was reading that spurred this question for me also gave an answer (from Leo Breiman, a UC Berkeley statistician):
• Prediction. To be able to predict what the responses are going to be
to future input variables;
• [Inference].23 To [infer] how nature is associating the response
variables to the input variables.
Source: http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf
I have a collection of training documents with publication dates, where each document is labeled as belonging (or not) to some topic T. I want to train a model that will predict for a new document (with publication date) whether or not it belongs to T, where the publication date might be in the past or in the future. Assume that I have decomposed each training document's text into a set of features (e.g., TF-IDF of words or n-grams) suitable for analysis by an appropriate binary classification algorithm provided by a library like Weka (for instance, multinomial naive Bayes, random forests, or SVM). The concept to be learned exhibits multiple seasonality; i.e., the prior probability that an arbitrary document published on a given date belongs to T depends heavily on when the date falls in a 4-year cycle (due to elections), where it falls in an annual cycle (due to holidays), and on the day of the week.
My research indicates that classification algorithms generally assume (as part of their statistical models) that training data is randomly sampled from the same pool of data that the model will ultimately be applied to. When the distribution of classes in the training data differs substantially from the known distribution in the wild, this leads to the so-called "class imbalance" problem. There are ways of compensating for this, including over-sampling underrepresented classes, under-sampling overrepresented classes, and using cost-sensitive classification. This allows a model creator to implicitly specify the prior probability that a new document will be positively classified, but importantly (and unfortunately for my purposes), this prior probability is assumed to be equal for all new documents.
I require more flexibility in my model. Because of the concept's seasonality, when classifying a new document, the model must explicitly take the publication date into account when determining the prior probability that the document belongs to T, and when the model calculates the posterior probability of belonging to T in light of the document's features, this prior probability should be properly accounted for. I am looking for a classifier implementation that either (1) bakes sophisticated regression of prior probabilities based on dates into the classifier, or (2) can be extended with a user-specified regression function that takes a date as input and gives the prior probability as output.
I am most familiar with the Weka library, but am open to using other tools if they are appropriate to the job. What is the most straightforward way of accomplishing this task?
Edit (in response to Doxav's point #2):
My concern is that date-based attributes should not be used for learning rules about when the topic applies, rather, they should be used only for determining the prior probability of whether the topic applies. Here's a concrete example: suppose that the topic T is "Christmas". A story published in July is indeed much less likely to be about Christmas than a story published in December. But what makes a story about Christmas is the textual content of the story, not when it was published. The relationship between publication date and "being about Christmas" is mere correlation, and therefore only useful for calculating the prior probability of an arbitrary story on an arbitrary date being about Christmas. By comparison, the relationship between TF-IDF (for some term in the story text) and "being about Christmas" is inherent and causative, and therefore worthy of incorporation into our model of what it means for a story to be about Christmas.
It seems like it can be simplified into typical ML problems: text classification + imbalanced data + seasonality identification + architecture + typical batch/offline vs stream/online learning :
Text classification: https://www.youtube.com/watch?v=IY29uC4uem8 is a good tutorial on text classification with Weka and covers imbalance data issue.
Seasonality identification: the goal is to enable the model to learn rules/inference on some different time attributes, so we should ease its job by extracting best known useful attributes. It means extracting typical date cycles (ie. week day, day of month, month, year...) and, if possible, also merge it with other more specific cycles or events (ie. elections, holidays, any custom cycle or frequent event). If you expect the model to learn on time series/sequences, you should create some lag data (attributes who happened before or statistics on recent time interval). It can be good to remove the date itself or any data which would make biase the model construction.
I don't know if you plan to deliver this as a service, but this can be of good inspiration: http://fr.slideshare.net/TraianRebedea/autonomous-news-clustering-and-classification-for-an-intelligent-web-portal .
Typical batch/offline vs Stream / online learning: Apparently you already know Weka which focuses on batch/offline learning. I don't know the size of your data and if you plan to continuously process new data and rebuild models, then you could consider moving to stream processing and online learning. Therefore, you could move to MOA which is very close to Weka but dedicated to stream classification, or use new streaming features of the latest version of Weka (steam processing and new online learners).
UPDATE 1 ; I read your comment and I see different solutions:
answer #2 is still one possible solution for your need even if it is not optimal. Getting an attribute indicating it's Christmas period will set an higher probability to tag it as a Christmas topic, same for the TF-IDF of the "word" Chritmas, BUT only both attributes together will set the max classification prob very highly to be Christmas.
you can use an attribute providing a seasonal weight for each word: TF-IDF with time weight, or use current Google Trends data for each word.
if you want a state of the art adaptive prior upon context you could look into hierarchical Bayesian models and smoothing from NLP solutions. It won't be Weka then and not as fast to test.
I am new to Machine learning. While reading about Supervised Learning, Unsupervised Learning, Reinforcement Learning I came across a question as below and got confused. Please help me in identifying in below three which one is Supervised Learning, Unsupervised Learning, Reinforcement learning.
What types of learning, if any, best describe the following three scenarios:
(i) A coin classification system is created for a vending machine. In order to do this,
the developers obtain exact coin specications from the U.S. Mint and derive
a statistical model of the size, weight, and denomination, which the vending
machine then uses to classify its coins.
(ii) Instead of calling the U.S. Mint to obtain coin information, an algorithm is
presented with a large set of labeled coins. The algorithm uses this data to
infer decision boundaries which the vending machine then uses to classify its
coins.
(iii) A computer develops a strategy for playing Tic-Tac-Toe by playing repeatedly
and adjusting its strategy by penalizing moves that eventually lead to losing.
(i) unsupervised learning - as no labelled data is available
(ii) supervised learning - as you already have labelled data available
(iii) reinforcement learning- where you learn and relearn based on the actions and the effects/rewards from that actions.
Let's say, you have dataset represented as matrix X. Each row in X is an observation (instance) and each column represents particular variable (feature).
If you also have (and use) vector y of labels, corresponding to observations, then this is a task of supervised learning. There's "supervisor" involved, that says which observations belong to class #1, which to class #2, etc.
If you don't have labels for observations, then you have to make decisions based on the X dataset itself. For example, in the example with coins you may want to build model of normal distribution for coin parameters and create system that signals when the coin has unusual parameters (and thus may be attempted fraud). In this case you don't have any kind of supervisor that would say what coins are ok and what represent fraud attempt. Thus, it is unsupervised learning task.
In 2 previous examples you first trained your model and then used it, without any further changes to the model. In reinforcement learning model is continuously improved based on processed data and the result. For example, robot that seeks to find the way from point A to point B may first compute parameters of the move, then shift based on these parameters, then analyze new position and update move parameters, so that next move would be more accurate (repeat until get to point B).
Based on this, I'm pretty sure you will be able to find correspondence between these 3 kinds of learning and your items.
I wrote an article on Perceptron for Novices. I have explained Supervised Learning in details with Delta Rule. Also described Unsupervised Learning and Reinforcement Learning (in brief). You may check if you are interested.
"An Intuitive Example of Artificial Neural Network (Perceptron) Detecting Cars / Pedestrians from a Self-driven Car"
https://www.spicelogic.com/Blog/Perceptron-Artificial-Neural-Networks-10