DQN on recommendation system - machine-learning

I want to use DQN on recommendation system for retail industry
but the problem is, the state space of this question are time-inhomogeneous & not deterministic
(compare to Atari games)
I figure out two method for this problem
make state-transition become deterministic
use historical data to calculate transition probabilities, use probabilities to transit state
but...both of them seems not make sense
somebody point out this kind issues
if I want to build a recommendation system based on Reinforcement Learning
where should I start?

Related

Is reinforcement learning applicable to a RANDOM environment?

I have a fundamental question on the applicability of reinforcement learning (RL) on a problem we are trying to solve.
We are trying to use RL for inventory management - where the demand is entirely random (it probably has a pattern in real life but for now let us assume that we have been forced to treat as purely random).
As I understand, RL can help learn how to play a game (say chess) or help a robot learn to walk. But all games have rules and so does the ‘cart-pole’ (of OpenAI Gym) – there are rules of ‘physics’ that govern when the cart-pole will tip and fall over.
For our problem there are no rules – the environment changes randomly (demand made for the product).
Is RL really applicable to such situations?
If it does - then what will improve the performance?
Further details:
- The only two stimuli available from the ‘environment’ are the currently available level of product 'X' and the current demand 'Y'
- And the ‘action’ is binary - do I order a quantity 'Q' to refill or do I not (discrete action space).
- We are using DQN and an Adam optimizer.
Our results are poor - I admit I have trained only for about 5,000 or 10,000 - should I let it train on for days because it is a random environment?
thank you
Rajesh
You are saying random in the sense of non-stationary, so, no, RL is not the best here.
Reinforcement learning assumes your environment is stationary. The underlying probability distribution of your environment (both transition and reward function) must be held constant throughout the learning process.
Sure, RL and DRL can deal with some slightly non-stationary problems, but it struggles at that. Markov Decision Processes (MDPs) and Partially-Observable MDPs assume stationarity. So value-based algorithms, which are specialized in exploiting MDP-like environments, such as SARSA, Q-learning, DQN, DDQN, Dueling DQN, etc., will have a hard time learning anything in non-stationary environments. The more you go towards policy-based algorithms, such as PPO, TRPO, or even better gradient-free, such as GA, CEM, etc., the better chance you have as these algorithms don't try to exploit this assumption. Also, playing with the learning rate would be essential to make sure the agent never stops learning.
Your best bet is to go towards black-box optimization methods such as Genetic Algorithms, etc.
Randomness can be handled by replacing single average reward output with a distribution with possible values. By introducing a new learning rule, reflecting the transition from Bellman’s (average) equation to its distributional counterpart, the Value distribution approach has been able to surpass the performance of all other comparable approaches.
https://www.deepmind.com/blog/going-beyond-average-for-reinforcement-learning

Reinforcement Learning: The dilemma of choosing discretization steps and performance metrics for continuous action and continuous state space

I am trying to write an adaptive controller for a control system, namely a power management system using Q-learning. I recently implemented a toy RL problem for the cart-pole system and worked out the formulation of the helicopter control problem from Andrew NG's notes. I appreciate how value function approximation is imperative in such situations. However both these popular examples have very small number of possible discrete actions. I have three questions:
1) What is the correct way to handle such problems if you don't have a small number of discrete actions? The dimensionality of my actions and states seems to have blown up and the learning looks very poor, which brings me to my next question.
2) How do I measure the performance of my agent? Since the reward changes in conjunction with the dynamic environment, at every time-step I can't decide the performance metrics for my continuous RL agent. Also unlike gridworld problems, I can't check the Q-value table due to huge state-action pairs, how do I know my actions are optimal?
3) Since I have a model for the evoluation of states through time. States = [Y, U]. Y[t+1] = aY[t] + bA, where A is an action.
Choosing discretization step for actions A will also affect how finely I have to discretize my state variable Y. How do I choose my discretization steps?
Thanks a lot!
You may use a continuous action reinforcement learning algorithm and completely avoid the discretization issue. I'd suggest you to take a look at CACLA.
As for the performance, you need to measure your agent's accumulated reward during an episode with learning turned off. Since your environment is stochastic, take many measurements and average them.
Have a look at policy search algorithms. Basically, they directly learn a parametric policy without an explicit value function, thus avoiding the problem of approximating the Q-function for continuous actions (eg, no discretization of the action space is needed).
One of the easiest and earliest policy search algorithm is policy gradient. Have a look here for a quick survey about the topic. And here for a survey about policy search (currently, there are more recent techniques, but that's a very good starting point).
In the case of control problem, there is a very simple toy task you can look, the Linear Quadratic Gaussian Regulator (LQG). Here you can find a lecture including this example and also an introduction to policy search and policy gradient.
Regarding your second point, if your environment is dynamic (that is, the reward function of the transition function (or both) change through time), then you need to look at non-stationary policies. That's typically a much more challenging problem in RL.

In Q-learning with function approximation, is it possible to avoid hand-crafting features?

I have little background knowledge of Machine Learning, so please forgive me if my question seems silly.
Based on what I've read, the best model-free reinforcement learning algorithm to this date is Q-Learning, where each state,action pair in the agent's world is given a q-value, and at each state the action with the highest q-value is chosen. The q-value is then updated as follows:
Q(s,a) = (1-α)Q(s,a) + α(R(s,a,s') + (max_a' * Q(s',a'))) where α is the learning rate.
Apparently, for problems with high dimensionality, the number of states become astronomically large making q-value table storage infeasible.
So the practical implementation of Q-Learning requires using Q-value approximation via generalization of states aka features. For example if the agent was Pacman then the features would be:
Distance to closest dot
Distance to closest ghost
Is Pacman in a tunnel?
And then instead of q-values for every single state you would only need to only have q-values for every single feature.
So my question is:
Is it possible for a reinforcement learning agent to create or generate additional features?
Some research I've done:
This post mentions A Geramifard's iFDD method
http://www.icml-2011.org/papers/473_icmlpaper.pdf
http://people.csail.mit.edu/agf/Files/13RLDM-GQ-iFDD+.pdf
which is a way of "discovering feature dependencies", but I'm not sure if that is feature generation, as the paper assumes that you start off with a set of binary features.
Another paper that I found was apropos is Playing Atari with Deep Reinforcement Learning, which "extracts high level features using a range of neural network architectures".
I've read over the paper but still need to flesh out/fully understand their algorithm. Is this what I'm looking for?
Thanks
It seems like you already answered your own question :)
Feature generation is not part of the Q-learning (and SARSA) algorithm. In a process which is called preprocessing you can however use a wide array of algorithms (of which you showed some) to generate/extract features from your data. Combining different machine learning algorithms results in hybrid architectures, which is a term you might look into when researching what works best for your problem.
Here is an example of using features with SARSA (which is very similar to Q-learning).
Whether the papers you cited are helpful for your scenario, you'll have to decide for yourself. As always with machine learning, your approach is highly problem-dependent. If you're in robotics and it's hard to define discrete states manually, a neural network might be helpful. If you can think of heuristics by yourself (like in the pacman example) then you probably won't need it.

What are some effective techniques for learning heuristic weights?

I have a minimax game playing program that sums together different heuristics to return a value for each state of the game. I would like to implement learning. I want the program to learn weights for each heuristic. What is the most effective means of having the program learn the weights for each heuristic? Of course, it would only know if a certain weight was effective for a certain heuristic after trying it. Is the only option some kind of trial and error system?
Thank you for your help!
I've not applied minimax much in practice - but in general its preferable to have an intrinsic measure of score/goodness/badness to base it off of. The first step would be to try and define such a score for you game - and expose that as an interface that is implemented for each supported game.
Is the only option some kind of trial and error system?
No! Genetic algorithms are popular for this kind of thing (at least among hobbyists), and can be used successfully for many problems (given sufficient time). You can find a lot of information related to this in early AI research, especially related to chess programs.
You can look up some of the research in hyperparameter optimization to find more machine learning style ways to do it. Unfortunately its not as well studied an area as it probably should be.
There are more possibilities depending on the specifics of the game being implemented / the nature of the heuristics.
Reinforcement Learning (RL), in particual Temporal Difference (TD) methods, deal with learning weights for heuristics in non-adversarial setting. How to learn weights for heuristics in a game setting, depends on what algorithms you use to play the game. The major classes of algorithms are alpha-beta minimax and UpperConfidenceTree. For minimax, you can look at the updates to values on the tree nodes as you increase the depth of the tree. I recommend starting by learning about RL-TD, and then reading Bootstrapping from Game Tree Search
by Joel Veness et. al.

What is machine learning? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
What is machine learning ?
What does machine learning code do ?
When we say that the machine learns, does it modify the code of itself or it modifies history (database) which will contain the experience of code for given set of inputs?
What is a machine learning ?
Essentially, it is a method of teaching computers to make and improve predictions or behaviors based on some data. What is this "data"? Well, that depends entirely on the problem. It could be readings from a robot's sensors as it learns to walk, or the correct output of a program for certain input.
Another way to think about machine learning is that it is "pattern recognition" - the act of teaching a program to react to or recognize patterns.
What does machine learning code do ?
Depends on the type of machine learning you're talking about. Machine learning is a huge field, with hundreds of different algorithms for solving myriad different problems - see Wikipedia for more information; specifically, look under Algorithm Types.
When we say machine learns, does it modify the code of itself or it modifies history (Data Base) which will contain the experience of code for given set of inputs ?
Once again, it depends.
One example of code actually being modified is Genetic Programming, where you essentially evolve a program to complete a task (of course, the program doesn't modify itself - but it does modify another computer program).
Neural networks, on the other hand, modify their parameters automatically in response to prepared stimuli and expected response. This allows them to produce many behaviors (theoretically, they can produce any behavior because they can approximate any function to an arbitrary precision, given enough time).
I should note that your use of the term "database" implies that machine learning algorithms work by "remembering" information, events, or experiences. This is not necessarily (or even often!) the case.
Neural networks, which I already mentioned, only keep the current "state" of the approximation, which is updated as learning occurs. Rather than remembering what happened and how to react to it, neural networks build a sort of "model" of their "world." The model tells them how to react to certain inputs, even if the inputs are something that it has never seen before.
This last ability - the ability to react to inputs that have never been seen before - is one of the core tenets of many machine learning algorithms. Imagine trying to teach a computer driver to navigate highways in traffic. Using your "database" metaphor, you would have to teach the computer exactly what to do in millions of possible situations. An effective machine learning algorithm would (hopefully!) be able to learn similarities between different states and react to them similarly.
The similarities between states can be anything - even things we might think of as "mundane" can really trip up a computer! For example, let's say that the computer driver learned that when a car in front of it slowed down, it had to slow down to. For a human, replacing the car with a motorcycle doesn't change anything - we recognize that the motorcycle is also a vehicle. For a machine learning algorithm, this can actually be surprisingly difficult! A database would have to store information separately about the case where a car is in front and where a motorcycle is in front. A machine learning algorithm, on the other hand, would "learn" from the car example and be able to generalize to the motorcycle example automatically.
Machine learning is a field of computer science, probability theory, and optimization theory which allows complex tasks to be solved for which a logical/procedural approach would not be possible or feasible.
There are several different categories of machine learning, including (but not limited to):
Supervised learning
Reinforcement learning
Supervised Learning
In supervised learning, you have some really complex function (mapping) from inputs to outputs, you have lots of examples of input/output pairs, but you don't know what that complicated function is. A supervised learning algorithm makes it possible, given a large data set of input/output pairs, to predict the output value for some new input value that you may not have seen before. The basic method is that you break the data set down into a training set and a test set. You have some model with an associated error function which you try to minimize over the training set, and then you make sure that your solution works on the test set. Once you have repeated this with different machine learning algorithms and/or parameters until the model performs reasonably well on the test set, then you can attempt to use the result on new inputs. Note that in this case, the program does not change, only the model (data) is changed. Although one could, theoretically, output a different program, but that is not done in practice, as far as I am aware. An example of supervised learning would be the digit recognition system used by the post office, where it maps the pixels to labels in the set 0...9, using a large set of pictures of digits that were labeled by hand as being in 0...9.
Reinforcement Learning
In reinforcement learning, the program is responsible for making decisions, and it periodically receives some sort of award/utility for its actions. However, unlike in the supervised learning case, the results are not immediate; the algorithm could prescribe a large sequence of actions and only receive feedback at the very end. In reinforcement learning, the goal is to build up a good model such that the algorithm will generate the sequence of decisions that lead to the highest long term utility/reward. A good example of reinforcement learning is teaching a robot how to navigate by giving a negative penalty whenever its bump sensor detects that it has bumped into an object. If coded correctly, it is possible for the robot to eventually correlate its range finder sensor data with its bumper sensor data and the directions that sends to the wheels, and ultimately choose a form of navigation that results in it not bumping into objects.
More Info
If you are interested in learning more, I strongly recommend that you read Pattern Recognition and Machine Learning by Christopher M. Bishop or take a machine learning course. You may also be interested in reading, for free, the lecture notes from CIS 520: Machine Learning at Penn.
Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases. Read more on Wikipedia
Machine learning code records "facts" or approximations in some sort of storage, and with the algorithms calculates different probabilities.
The code itself will not be modified when a machine learns, only the database of what "it knows".
Machine learning is a methodology to create a model based on sample data and use the model to make a prediction or strategy. It belongs to artificial intelligence.
Machine learning is simply a generic term to define a variety of learning algorithms that produce a quasi learning from examples (unlabeled/labeled). The actual accuracy/error is entirely determined by the quality of training/test data you provide to your learning algorithm. This can be measured using a convergence rate. The reason you provide examples is because you want the learning algorithm of your choice to be able to informatively by guidance make generalization. The algorithms can be classed into two main areas supervised learning(classification) and unsupervised learning(clustering) techniques. It is extremely important that you make an informed decision on how you plan on separating your training and test data sets as well as the quality that you provide to your learning algorithm. When you providing data sets you want to also be aware of things like over fitting and maintaining a sense of healthy bias in your examples. The algorithm then basically learns wrote to wrote on the basis of generalization it achieves from the data you have provided to it both for training and then for testing in process you try to get your learning algorithm to produce new examples on basis of your targeted training. In clustering there is very little informative guidance the algorithm basically tries to produce through measures of patterns between data to build related sets of clusters e.g kmeans/knearest neighbor.
some good books:
Introduction to ML (Nilsson/Stanford),
Gaussian Process for ML,
Introduction to ML (Alpaydin),
Information Theory Inference and Learning Algorithms (very useful book),
Machine Learning (Mitchell),
Pattern Recognition and Machine Learning (standard ML course book at Edinburgh and various Unis but relatively a heavy reading with math),
Data Mining and Practical Machine Learning with Weka (work through the theory using weka and practice in Java)
Reinforcement Learning there is a free book online you can read:
http://www.cs.ualberta.ca/~sutton/book/ebook/the-book.html
IR, IE, Recommenders, and Text/Data/Web Mining in general use alot of Machine Learning principles. You can even apply Metaheuristic/Global Optimization Techniques here to further automate your learning processes. e.g apply an evolutionary technique like GA (genetic algorithm) to optimize your neural network based approach (which may use some learning algorithm). You can approach it purely in form of a probablistic machine learning approach for example bayesian learning. Most of these algorithms all have a very heavy use of statistics. Concepts of convergence and generalization are important to many of these learning algorithms.
Machine learning is the study in computing science of making algorithms that are able to classify information they haven't seen before, by learning patterns from training on similar information. There are all sorts of kinds of "learners" in this sense. Neural networks, Bayesian networks, decision trees, k-clustering algorithms, hidden markov models and support vector machines are examples.
Based on the learner, they each learn in different ways. Some learners produce human-understandable frameworks (e.g. decision trees), and some are generally inscrutable (e.g. neural networks).
Learners are all essentially data-driven, meaning they save their state as data to be reused later. They aren't self-modifying as such, at least in general.
I think one of the coolest definitions of machine learning that I've read is from this book by Tom Mitchell. Easy to remember and intuitive.
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E
Shamelessly ripped from Wikipedia: Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases.
Quite simply, machine learning code accomplishes a machine learning task. That can be a number of things from interpreting sensor data to a genetic algorithm.
I would say it depends. No, modifying code is not normal, but is not outside the realm of possibility. I would also not say that machine learning always modifies a history. Sometimes we have no history to build off of. Sometime we simply want to react to the environment, but not actually learn from our past experiences.
Basically, machine learning is a very wide-open discipline that contains many methods and algorithms that make it impossible for there to be 1 answer to your 3rd question.
Machine learning is a term that is taken from the real world of a person, and applied on something that can't actually learn - a machine.
To add to the other answers - machine learning will not (usually) change the code, but it might change it's execution path and decision based on previous data or new gathered data and hence the "learning" effect.
there are many ways to "teach" a machine - you give weights to many parameter of an algorithm, and then have the machine solve it for many cases, each time you give her a feedback about the answer and the machine adjusts the weights according to how close the machine answer was to your answer or according to the score you gave it's answer, or according to some results test algorithm.
This is one way of learning and there are many more...

Resources