How to train an artificial neural network to play Diablo 2 using visual input? - machine-learning

I'm currently trying to get an ANN to play a video game and and I was hoping to get some help from the wonderful community here.
I've settled on Diablo 2. Game play is thus in real-time and from an isometric viewpoint, with the player controlling a single avatar whom the camera is centered on.
To make things concrete, the task is to get your character x experience points without having its health drop to 0, where experience point are gained through killing monsters. Here is an example of the gameplay:
Now, since I want the net to operate based solely on the information it gets from the pixels on the screen, it must learn a very rich representation in order to play efficiently, since this would presumably require it to know (implicitly at least) how divide the game world up into objects and how to interact with them.
And all of this information must be taught to the net somehow. I can't for the life of me think of how to train this thing. My only idea is have a separate program visually extract something innately good/bad in the game (e.g. health, gold, experience) from the screen, and then use that stat in a reinforcement learning procedure. I think that will be part of the answer, but I don't think it'll be enough; there are just too many levels of abstraction from raw visual input to goal-oriented behavior for such limited feedback to train a net within my lifetime.
So, my question: what other ways can you think of to train a net to do at least some part of this task? preferably without making thousands of labeled examples.
Just for a little more direction: I'm looking for some other sources of reinforcement learning and/or any unsupervised methods for extracting useful information in this setting. Or a supervised algorithm if you can think of a way of getting labeled data out of a game world without having to manually label it.
UPDATE(04/27/12):
Strangely, I'm still working on this and seem to be making progress. The biggest secret to getting a ANN controller to work is to use the most advanced ANN architectures appropriate to the task. Hence I've been using a deep belief net composed of factored conditional restricted Boltzmann machines that I've trained in an unsupervised manner (on video of me playing the game) before fine tuning with temporal difference back-propagation (i.e. reinforcement learning with standard feed-forward ANNs).
Still looking for more valuable input though, especially on the problem of action selection in real-time and how to encode color images for ANN processing :-)
UPDATE(10/21/15):
Just remembered I asked this question back-in-the-day, and thought I should mention that this is no longer a crazy idea. Since my last update, DeepMind published their nature paper on getting neural networks to play Atari games from visual inputs. Indeed, the only thing preventing me from using their architecture to play, a limited subset, of Diablo 2 is the lack of access to the underlying game engine. Rendering to the screen and then redirecting it to the network is just far too slow to train in a reasonable amount of time. Thus we probably won't see this sort of bot playing Diablo 2 anytime soon, but only because it'll be playing something either open-source or with API access to the rendering target. (Quake perhaps?)

I can see that you are worried about how to train the ANN, but this project hides a complexity that you might not be aware of. Object/character recognition on computer games through image processing it's a highly challenging task (not say crazy for FPS and RPG games). I don't doubt of your skills and I'm also not saying it can't be done, but you can easily spend 10x more time working on recognizing stuff than implementing the ANN itself (assuming you already have experience with digital image processing techniques).
I think your idea is very interesting and also very ambitious. At this point you might want to reconsider it. I sense that this project is something you are planning for the university, so if the focus of the work is really ANN you should probably pick another game, something more simple.
I remember that someone else came looking for tips on a different but somehow similar project not too long ago. It's worth checking it out.
On the other hand, there might be better/easier approaches for identifying objects in-game if you're accepting suggestions. But first, let's call this project for what you want it to be: a smart-bot.
One method for implementing bots accesses the memory of the game client to find relevant information, such as the location of the character on the screen and it's health. Reading computer memory is trivial, but figuring out exactly where in memory to look for is not. Memory scanners like Cheat Engine can be very helpful for this.
Another method, which works under the game, involves manipulating rendering information. All objects of the game must be rendered to the screen. This means that the locations of all 3D objects will eventually be sent to the video card for processing. Be ready for some serious debugging.
In this answer I briefly described 2 methods to accomplish what you want through image processing. If you are interested in them you can find more about them on Exploiting Online Games (chapter 6), an excellent book on the subject.

UPDATE 2018-07-26: That's it! We are now approaching the point where this kind of game will be solvable! Using OpenAI and based on the game DotA 2, a team could make an AI that can beat semi-professional gamers in a 5v5 game. If you know DotA 2, you know this game is quite similar to Diablo-like games in terms of mechanics, but one could argue that it is even more complicated because of the team play.
As expected, this was achieved thanks to the latest advances in reinforcement learning with deep learning, and using open game frameworks like OpenAI which eases the development of an AI since you get a neat API and also because you can accelerate the game (the AI played the equivalent of 180 years of gameplay against itself everyday!).
On the 5th of August 2018 (in 10 days!), it is planned to pit this AI against top DotA 2 gamers. If this works out, expect a big revolution, maybe not as mediatized as the solving of the Go game, but it will nonetheless be a huge milestone for games AI!
UPDATE 2017-01: The field is moving very fast since AlphaGo's success, and there are new frameworks to facilitate the development of machine learning algorithms on games almost every months. Here is a list of the latest ones I've found:
OpenAI's Universe: a platform to play virtually any game using machine learning. The API is in Python, and it runs the games behind a VNC remote desktop environment, so it can capture the images of any game! You can probably use Universe to play Diablo II through a machine learning algorithm!
OpenAI's Gym: Similar to Universe but targeting reinforcement learning algorithms specifically (so it's kind of a generalization of the framework used by AlphaGo but to a lot more games). There is a course on Udemy covering the application of machine learning to games like breakout or Doom using OpenAI Gym.
TorchCraft: a bridge between Torch (machine learning framework) and StarCraft: Brood War.
pyGTA5: a project to build self-driving cars in GTA5 using only screen captures (with lots of videos online).
Very exciting times!
IMPORTANT UPDATE (2016-06): As noted by OP, this problem of training artificial networks to play games using only visual inputs is now being tackled by several serious institutions, with quite promising results, such as DeepMind Deep-Qlearning-Network (DQN).
And now, if you want to get to take on the next level challenge, you can use one of the various AI vision game development platforms such as ViZDoom, a highly optimized platform (7000 fps) to train networks to play Doom using only visual inputs:
ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research in machine visual learning, and deep reinforcement learning, in particular.
ViZDoom is based on ZDoom to provide the game mechanics.
And the results are quite amazing, see the videos on their webpage and the nice tutorial (in Python) here!
There is also a similar project for Quake 3 Arena, called Quagents, which also provides easy API access to underlying game data, but you can scrap it and just use screenshots and the API only to control your agent.
Why is such a platform useful if we only use screenshots? Even if you don't access underlying game data, such a platform provide:
high performance implementation of games (you can generate more data/plays/learning generations with less time so that your learning algorithms can converge faster!).
a simple and responsive API to control your agents (ie, if you try to use human inputs to control a game, some of your commands may be lost, so you'd also deal with unreliability of your outputs...).
easy setup of custom scenarios.
customizable rendering (can be useful to "simplify" the images you get to ease processing)
synchronized ("turn-by-turn") play (so you don't need your algorithm to work in realtime at first, that's a huge complexity reduction).
additional convenience features such as crossplatform compatibility, retrocompatibility (you don't risk your bot not working with the game anymore when there is a new game update), etc.
To summarize, the great thing about these platforms is that they alleviate much of the previous technical issues you had to deal with (how to manipulate game inputs, how to setup scenarios, etc.) so that you just have to deal with the learning algorithm itself.
So now, get to work and make us the best AI visual bot ever ;)
Old post describing the technical issues of developping an AI relying only on visual inputs:
Contrary to some of my colleagues above, I do not think this problem is intractable. But it surely is a hella hard one!
The first problem as pointed out above is that of the representation of the state of the game: you can't represent the full state with just a single image, you need to maintain some kind of memorization (health but also objects equipped and items available to use, quests and goals, etc.). To fetch such informations you have two ways: either by directly accessing the game data, which is the most reliable and easy; or either you can create an abstract representation of these informations by implementing some simple procedures (open inventory, take a screenshot, extract the data). Of course, extracting data from a screenshot will either have you to put in some supervised procedure (that you define completely) or unsupervised (via a machine learning algorithm, but then it'll scale up a lot the complexity...). For unsupervised machine learning, you will need to use a quite recent kind of algorithms called structural learning algorithms (which learn the structure of data rather than how to classify them or predict a value). One such algorithm is the Recursive Neural Network (not to confuse with Recurrent Neural Network) by Richard Socher: http://techtalks.tv/talks/54422/
Then, another problem is that even when you have fetched all the data you need, the game is only partially observable. Thus you need to inject an abstract model of the world and feed it with processed information from the game, for example the location of your avatar, but also the location of quest items, goals and enemies outside the screen. You may maybe look into Mixture Particle Filters by Vermaak 2003 for this.
Also, you need to have an autonomous agent, with goals dynamically generated. A well-known architecture you can try is BDI agent, but you will probably have to tweak it for this architecture to work in your practical case. As an alternative, there is also the Recursive Petri Net, which you can probably combine with all kinds of variations of the petri nets to achieve what you want since it is a very well studied and flexible framework, with great formalization and proofs procedures.
And at last, even if you do all the above, you will need to find a way to emulate the game in accelerated speed (using a video may be nice, but the problem is that your algorithm will only spectate without control, and being able to try for itself is very important for learning). Indeed, it is well-known that current state-of-the-art algorithm takes a lot more time to learn the same thing a human can learn (even more so with reinforcement learning), thus if can't speed up the process (ie, if you can't speed up the game time), your algorithm won't even converge in a single lifetime...
To conclude, what you want to achieve here is at the limit (and maybe a bit beyond) of current state-of-the-art algorithms. I think it may be possible, but even if it is, you are going to spend a hella lot of time, because this is not a theoretical problem but a practical problem you are approaching here, and thus you need to implement and combine a lot of different AI approaches in order to solve it.
Several decades of research with a whole team working on it would may not suffice, so if you are alone and working on it in part-time (as you probably have a job for a living) you may spend a whole lifetime without reaching anywhere near a working solution.
So my most important advice here would be that you lower down your expectations, and try to reduce the complexity of your problem by using all the information you can, and avoid as much as possible relying on screenshots (ie, try to hook directly into the game, look for DLL injection), and simplify some problems by implementing supervised procedures, do not let your algorithm learn everything (ie, drop image processing for now as much as possible and rely on internal game informations, later on if your algorithm works well, you can replace some parts of your AI program with image processing, thus gruadually attaining your full goal, for example if you can get something to work quite well, you can try to complexify your problem and replace supervised procedures and memory game data by unsupervised machine learning algorithms on screenshots).
Good luck, and if it works, make sure to publish an article, you can surely get renowned for solving such a hard practical problem!

The problem you are pursuing is intractable in the way you have defined it. It is usually a mistake to think that a neural network would "magically" learn a rich reprsentation of a problem. A good fact to keep in mind when deciding whether ANN is the right tool for a task is that it is an interpolation method. Think, whether you can frame your problem as finding an approximation of a function, where you have many points from this function and lots of time for designing the network and training it.
The problem you propose does not pass this test. Game control is not a function of the image on the screen. There is a lot of information the player has to keep in memory. For a simple example, it is often true that every time you enter a shop in a game, the screen looks the same. However, what you buy depends on the circumstances. No matter how complicated the network, if the screen pixels are its input, it would always perform the same action upon entering the store.
Besides, there is the problem of scale. The task you propose is simply too complicated to learn in any reasonable amount of time. You should see aigamedev.com for how game AI works. Artitificial Neural Networks have been used successfully in some games, but in very limited manner. Game AI is difficult and often expensive to develop. If there was a general approach of constructing functional neural networks, the industry would have most likely seized on it. I recommend that you begin with much, much simpler examples, like tic-tac-toe.

Seems like the heart of this project is exploring what is possible with an ANN, so I would suggest picking a game where you don't have to deal with image processing (which from other's answers on here, seems like a really difficult task in a real-time game). You could use the Starcraft API to build your bot, they give you access to all relevant game state.
http://code.google.com/p/bwapi/

As a first step you might look at the difference of consecutive frames. You have to distinguish between background and actual monster sprites. I guess the world may also contain animations. In order to find those I would have the character move around and collect everything that moves with the world into a big background image/animation.
You could detect and and identify enemies with correlation (using FFT). However if the animations repeat pixel-exact it will be faster to just look at a few pixel values. Your main task will be to write a robust system that will identify when a new object appears on the screen and will gradually all the frames of the sprite frame to a database. Probably you have to build models for weapon effects as well. Those can should be subtracted so that they don't clutter your opponent database.

Well assuming at any time you could generate a set of 'outcomes' (might involve probabilities) from a set of all possible 'moves', and that there is some notion of consistency in the game (eg you can play level X over and over again), you could start with N neural networks with random weights, and have each of them play the game in the following way:
1) For every possible 'move', generate a list of possible 'outcomes' (with associated probabilities)
2) For each outcome, use your neural network to determine an associated 'worth' (score) of the 'outcome' (eg a number between -1 and 1, 1 being the best possible outcome, -1 being the worst)
3) Choose the 'move' leading to the highest prob * score
4) If the move led to a 'win' or 'lose', stop, otherwise go back to step 1.
After a certain amount of time (or a 'win'/'lose'), evaluate how close the neural network was to the 'goal' (this will probably involve some domain knowledge). Then throw out the 50% (or some other percentage) of NNs that were farthest away from the goal, do crossover/mutation of the top 50%, and run the new set of NNs again. Continue running until a satisfactory NN comes out.

I think your best bet would be a complex architecture involving a few/may networks: i.e. one recognizing and responding to items, one for the shop, one for combat (maybe here you would need one for enemy recognition, one for attacks), etc.
Then try to think of the simplest possible Diablo II gameplay, probably a Barbarian. Then keep it simple at first, like Act I, first area only.
Then I guess valuable 'goals' would be disappearance of enemy objects, and diminution of health bar (scored inversely).
Once you have these separate, 'simpler' tasks taken care of, you can use a 'master' ANN to decide which sub-ANN to activate.
As for training, I see only three options: you could use the evolutionary method described above, but then you need to manually select the 'winners', unless you code a whole separate program for that. You could have the networks 'watch' someone play. Here they will learn to emulate a player or group of player's style. The network tries to predict the player's next action, gets reinforced for a correct guess, etc. If you actually get the ANN you want this could be done with video gameplay, no need for actual live gameplay. Finally you could let the network play the game, having enemy deaths, level ups, regained health, etc. as positive reinforcement and player deaths, lost health, etc. as negative reinforcement. But seeing how even a simple network requires thousands of concrete training steps to learn even simple tasks, you would need a lot of patience for this one.
All in all your project is very ambitious. But I for one think it could 'in theory be done', given enough time.
Hope it helps and good luck!

Related

How to track Fast Moving Objects?

I'm trying to create an application that will be able to track rapidly moving objects in video/camera feed, however have not found any CV/DL solution that is good enough. Can you recommend any computer vision solution for tracking fast moving objects on regular laptop computer and web cam? A demo app would be ideal.
For example see this video where the tracking is done in hardware (I'm looking for software solution) : https://www.youtube.com/watch?v=qn5YQVvW-hQ
Target tracking is a very difficult problem. In target tracking you will have two main issues: the motion uncertainty problem, and the origin uncertainty problem. The first one refers to the way you model object motion so you can predict its future state, and the second refers to the issue of data association(what measurement corresponds to what track, and the literature is filled with scientific ways in which this issue can be approached).
Before you can come up with a solution to your problem you will have to answer some questions yourself, regarding the tracking problem you want to solve. For example: what are the values that you what to track(this will define your state vector), how are those values related to one another, are you trying to perform single object tracking or multiple object tracking, how are the objects moving( do they have a relatively constant acceleration or velocity ) or not, do objects make turns, can objects also be occluded or not and so on.
The Kalman Filter is good solution to predict the next state of your system (once you have identified your process model). A deep learning alternative to the Kalman filter is the so called Deep Kalman Filter which essentially is used to do the same thing. In case your process or measurement models are not linear, you will have to linearize them before predicting the next state. Some solutions that deal with non-linear process or measurement models are the Extended Kalman Filter (EKF) or Unscented Kalman Filter (UKF).
Now related to fast moving objects, an idea you can use is to have a larger covariance matrix since the objects can move a lot more if they are fast, so the search space for the correct association has to be a bit larger. Additionally you can use multiple motion models in case your motion model cannot be satisfied with only one model.
In case of occlusions I will leave you this stack overflow thread, where I have given an answer covering more details regarding occlusion handling in case of tracking. I have added some references for you to read. You will have to provide more details in your question, if you would like to receive more information regarding a solution (for example you should define fast moving objects with respect to camera frame rate).
I personally do not think there is a silver bullet solution for the tracking problem, I prefer to tailor a solution to the problem I am trying to solve.
The tracking problem is complicated. It is also more in the realm of control systems than computer vision. It would be also helpful to know more about your situation, as the performance of the chosen method pretty much depends on your problem constraints. Are you interested in real-time tracking? Are you trying to reconstruct an existing trajectory? Are there multiple targets? Just one? Are the physical properties of the targets (i.e. velocity, direction, acceleration) constant?
One of the most basic tracking methods is implemented by a Linear Dynamic System (LDS) description, in concrete, a discrete implementation, since we’re working with discrete frames of information. This method is purely based on physics, and its prediction is very sensitive. Depending on your application, the error rate could be acceptable… or not.
A more robust solution is Kalman’s Filter, and it is pretty much the go-to answer when tracking is needed. It implements prediction based on all the measurements obtained so far during the model's lifetime. It mainly works on constant-based measurements (velocity and acceleration) although it can be extended to handle non-constant models. If you are working with targets that won't exhibit a drastic change in their velocity, this is what you (probably) should implement.
I'm sorry I can't provide you with more, but the topic is pretty extensive and, admittedly, the details are beyond my area of expertise. Hopefully, this info should give you a little bit of context for finding a solution.
The problem of tracking fast-moving objects (FMO) is a known research topic in computer vision. FMOs are defined as objects which move over a distance larger than their size in one video frame. The solutions which have been proposed use classical image processing and energy minimization to establish their trajectories and sharp appearance.
If you need a demo app, I would suggest this GitHub repository: https://github.com/rozumden/fmo-cpp-demo. The demo is written in OpenCV/C++ and runs in real-time. The authors also provide a mobile app version, which is still in testing mode. Using this demo app you can track any fast moving objects in real-time without even providing an object model. However, if you provide object size in real-world units, the app can also estimate object speed.
A more sophisticated algorithm is open-sourced here: https://github.com/rozumden/deblatting_python, written in Python and PyTorch for speed-up. The repository contains a solution to the deblatting (deblurring and matting) problem, exactly what happens when a Fast Moving Object appears in front of a camera.

train a neural network on real subject input/output to have it behave similarly to subject

The goal is to create an AI to play a simple game, tracking a horizontally moving dot across the screen which increases speed until no longer tracked.
I would like to create an AI to behave similarly to a real test subject. I have a large amount of trials that were recorded of many months, position of dot on screen and user cursor position over time.
I would like to train the network on these trials so that the network behaves similarly to a real test subject and I can then obtain very large amounts of test data to observe how changing the parameters of the game affects the networks ability to track the moving dot.
I am interested in learning about the underlying code of neural networks and would love some advice on where to start with this project. I understand AIs can get very good at performing different tasks such as snake, or other simple games, but my goal would be to have the AI perform similarly to a real test subject.
Your question is a bit broad, but i'll try to answer nonetheless.
To imitate a subjects behavior you could use and LSTM network which has an understanding of the state it's in (in your case the state may include information about how fast and in which direction the dot is going and where the pointer is) and then decides on an action. You will need to feed your data (the dot coordination and users behavior) into the the network.
A simpler yet effective approach would be using simple MLP network. Your problem does not seem like a hard one and a simple network should be able to learn what a user would in a certain situation. However, based on what you mean by "perform similarly to a real test subject" you might need a more complex architecture.
Finally there are GAN networks, which are somewhat complicated if you're not familiar with NNs, are hard and time-consuming to train and in some cases might fail to train at all. The bright side is they are exactly designed to imitate a probability distribution (or to put it more simply, a set of data).
There are two more important notes to mention:
The performance of your network depends heavily on your data and the game. for example, if in your dataset users have acted very differently to the same situation MLP or LSTMs will not be able to learn all those reactin.
Your network can only imitate what it's taught. So if you're planning to figure out what a human agent would do under some conditions that never happened in your dataset (e.g. if in your dataset the dot only moves in a line but you want it to move in a circle when experimenting) you won't get good results.
hope this helps.

Needed advice for an enjoyable AI in Buraco card game

i’m trying to build an effective AI for the Buraco card game (2 and 4 players).
I want to avoid the heuristic approach : i’m not an expert of the game and for the last games i’ve developed this way i obtained mediocre results with that path.
I know the montecarlo tree search algorithm, i’ve used it for a checkers game with discrete result but I’m really confused by the recent success of other Machine Learning options.
For example i found this answer in stack overflow that really puzzles me, it says :
"So again: build a bot which can play against itself. One common basis is a function Q(S,a) which assigns to any game state and possible action of the player a value -- this is called Q-learning. And this function is often implemented as a neural network ... although I would think it does not need to be that sophisticated here.”
I’m very new to Machine Learning (this should be Reinforcement Learning, right?) and i only know a little of Q-learning but it sounds like a great idea: i take my bot, making play against itself and then it learns from its results… the problem is that i have no idea how to start! (and neither if this approach could be good or not).
Could you help me to get the right direction?
Is the Q-learning strategy a good one for my domain?
Is the Montecarlo still the best option for me?
Would it work well in a 4 players game like Buraco (2 opponents and 1 team mate)?
Is there any other method that i’m ignoring?
PS: My goal is to develop an enjoyable AI for a casual application, i can even consider the possibility to make the AI cheating for example by looking at the players hands or deck. Even with this, ehm, permission i would not be able to build a good heuristic, i think :/
Thank you guys for your help!
EDIT: thinking again about your knowledge in MCTS and your implementation of checkers, I think my answer is formulated too pedagogical -- you probably know much of this if you know MCTS. Nevertheless, I hope it helps. Feel free to ask specific questions in the comments.
I was the one to suggest you to open a new question, so I fell I should at least give an answer here. However, right at the beginning, I want to clarify about what should be expected here. The things you are asking for are quite complex and require some solid experience in the realization. I had the occasion to implement optimal strategies for quite a few games, and I'd consider this here still a heavy challenge. So I think the goal here is to get some overview -- and not a proper solution (which I couldn't give anyhow as I don't know the game of Buraco).
As stated in the other answer, the theory of reinforcement learning provides a solid foundation. A good introduction is the book of Sutton and Barto with the same title. It's really readible and I'd suggest to get through the first five chapters or so (these cover Dynamic Programming and Monte Carlo). Two points in which this book is not sufficient are: (i) it does not give explicit application to two-player games, and (ii) it mainly relies on tabular representations of the value functions (--no neural networks and the like are involved).
A elementary part of the implementation is the state S of the game. This state should capture the complete current status of the game as compact and non-redundant as possible. Further, for each state S, you need to be able to assign the available actions A (like taking a card). Moreover, depending on the method you want to apply, it helps to know the probability distribution p(S'|S,A) which gives the probability to end up in state S' when you are in state S and do action A. And finally, you need to specify the reward r(S'|S,A) when you are in state S and do the action A to end in S' (for two player zero-sum games the reward can be chosen quite easily: when S' is the state where you win, you get +1 as reward, if you lose -1, otherwise 0 -- this however does not hold for games with more players or non-zero-sum games).
From a given state S, you further can derive the reduced states S_P1, S_P2, etc. which the single players see. Those states capture the information of a specific player (which is of course less than the complete state -- a player does not know the cards of the opponent, and also not the order of cards in the deck, for example). These reduced states provide the base on which players make their decisions. So, the goal for player 1 is to get a function Q_1(S_P1, A) which tells him: when I'm in state S_P1, I should at best do the action A (this is the idea of Q-learning). And the same holds for the other players. These functions have to be trained so that optimal results come out.
The way to do so is through the central equation of reinforcement learning, the Bellman equation. There are several solution methods like value iteration, Monte Carlo (this is basically the method you referenced in your link), temporal difference methods, and so on. Some of these methods can be interpreted as your digital players which repeatedly play against each other and in this process hopefully get better each. However, this is not guaranteed, it is easy to get stuck in local minima like in any optimization problem. At best you already have a good digital player at hand (from somewhere) and you train only one player to beat him -- in this way you reduce the two-player problem to a single-player problem which is much easier.
I'll make a cut here, because this stuff is really better to grasp from the book, and because I know this answer won't help you practically even if it were ten pages long. Here one last suggestion how to proceed: (i) get the foundations from the book, (ii) implement some toy problems therein, (iii) extend the stuff to two player games -- see also the minimax theorem for this, (iv) solve some more easy games (like tic-tac-toe, even this takes long), (v) get acquainted with neural networks and all this stuff, and (vi) tackle Buraco -- that's a tough program, however.

Online machine learning for obstacle crossing or bypassing

I want to program a robot which will sense obstacles and learn whether to cross over them or bypass around them.
Since my project, must be realized in week and a half period, I must use an online learning algorithm (GA or such would take a lot time to test because robot needs to try to cross over the obstacle in order to determine is it possible to cross).
I'm really new to online learning so I don't really know which online learning algorithm to use.
It would be a great help if someone could recommend me a few algorithms that would be the best for my problem and some link with examples wouldn't hurt.
Thanks!
I think you could start with A* (A-Star)
It's simple and robust, and widely used.
There are some nice tutorials on the web like this http://www.raywenderlich.com/4946/introduction-to-a-pathfinding
Online algorithm is just the one that can collect new data and update a model incrementally without re-training with full dataset (i.e. it may be used in online service that works all the time). What you are probably looking for is reinforcement learning.
RL itself is not a method, but rather general approach to the problem. Many concrete methods may be used with it. Neural networks have been proved to do well in this field (useful course). See, for example, this paper.
However, to create real robot being able to bypass obstacles you will need much then just knowing about neural networks. You will need to set up sensors carefully, preprocess data from them, work out your model and collect a dataset. Not sure it's possible to even learn it all in a week and a half.

Does a learning AI make sense as an opponent in a game?

I want to build an AI opponent for a simple game of four in a row. However, I don't simply want to create a perfect player, which would be rather boring for the human. Instead, I would like to have an AI that practically starts from zero and learns the game over time.
The only approach to this that I know of are artificial neuronal networks. However, it seems that these usually require supervised learning. Also, this document, for example, states that the AI only approaches being a perfect player after about 20k games - a bit too much for a human to play.
Therefore I wonder: Is it possible to make reasonable use of a learning AI in a simple game? Are there any suitable alternatives or extensions to neuronal networks to do this job?
I don't know of any algorithm or technique off the top of my head that will let a computer learn a game on anything comparable to the same time scale as a human being. But we have to be careful when we talk about time scale.
There is, for instance, a technique developed by Fogel and Chellapilla, which plays a bunch of randomly generated neural networks against each other, and then uses a genetic algorithm to create new and better neural networks based on the results. This was originally done with checkers, but would be applicable to many games. That technique at least removes the burden of human training-- the networks are playing against themselves.
But how fast does that learn? Fogel and Chellapilla got good quality results (Class A performance, which is just under rated Expert) on checkers in only about 250 generations... but each generation's tournament included about 150 separate games, for about 37k games total. If you played a game a day, it would take you 100 years to play that many. Maybe people who play at that level have played ten games a day for ten years, but that seems... unlikely. So in that sense, slower than a human being. On the other hand, a good laptop can probably play that many games in a week, which no human could ever do.
So if you're looking for a training routine where a human being will be able to train and perceive the performance increase on a reasonable scale... I know of nothing that can do that, today. (Which stands to reason-- our best supercomputers still don't have the raw processing power of a human brain, and we have no algorithms designed to take advantage of that much power, yet.)
If you're just looking for an imperfect AI, though, you might try a technique like Fogel's and Chellapilla's, and instead of taking the ultimate, near-expert rated results, just take something from halfway through the run, or something from the last generation but not the best result.
You might want to looking into the field called "General Game Playing". The focus there is on learning to play games that the computer has never seen before. The algorithm is handed the rules of the game in a well-defined format and it has to learn to play from scratch.
The state of the art techniques pretty much always incorporate some sort of Monte-Carlo simulation where the system plays thousands of games against itself in simulation as it simultaneously plays the "real" games against humans or other programs that it's being measured on.

Resources