Weights updating and estimating training example values in playing checks - machine-learning

I am reading the Tom Mitchell's Machine Learning book, the first chapter.
What I want do is to write the program to play checker with itself, and learn to win at the end. My question is about the credit assignment of a non-terminal board position it encounters. Maybe we can set the value using the linear combination of its feature and randomly weights, how to updates it with LMS rules? Because we don't have the training samples apart from ending states.
I am not sure whether I state my question clearly although I tried to.

I haven't read that specific book, but my approach would be the following. Suppose that White wins. Then, every position White passed through should receive positive credit, while every position Black passed through should receive negative credit. If you iterate this reasoning, whenever you have a set of moves making up a game, you should add some amount of score to all positions from the victor and remove some amount of score from all positions from the loser. You do this for a bunch of computer vs. computer games.
You now have a data set made up of a bunch of checker positions and respective scores. You can now compute features over those positions and train your favorite regressor, such as LMS.
An improvement of this approach would be to train the regressor, then make some more games where each move is randomly drawn according to the predicted score of that move (i.e. moves which lead to positions with higher scores have higher probability). Then you update those scores and re-train the regressor, etc.

Related

Generative Model, but sampling from quantities with physical meaning?

To my understanding the canonical generative models (GAN, VAE) sample data from what is basically random noise, or more specifically for the latter from a learned latent distribution which we have no real control of. The generative process is usually to take a value randomly, put it into the model, and get data back.
But I want this random value to have a meaning and to be constrained by my needs.
I work in particle physics, but for analogy let's say that my data is a collection of photos of bullet impact on bulletproof glass (reality: particle interaction in a detector)
The glass shattering will of course depend from the bullet type, but also from the bullet velocity and angle of incidence. One categorical variable and two continuous ones.
I want to generate more photos, but in the new photo I want to control the angle of the bullet and its velocity. In other words, I want to tell the model: "give me impacts of a .22 cal striking at 900 m/s and 10° incidence" and get in return photos that could have resulted from such an impact.
How do I do that?
I realise there may be established techniques but I am lacking the keywords or names to search for them. Any starter is much welcome

What is wrong with my approach of using MLP to make a chess engine?

I’m making a chess engine using machine learning, and I’m experiencing problems debugging it. I need help figuring out what is wrong with my program, and I would appreciate any help.
I made my research and borrowed ideas from multiple successful projects. The idea is to use reinforcement learning to teach NN to differentiate between strong and weak positions.
I collected 3 million games with Elo over 2000 and used my own method to label them. After researching hundreds of games, I found out, that it’s safe to assume that in the last 10 turns of any game, the balance doesn’t change, and the winning side has a strong advantage. So I picked positions from the last 10 turns and made two labels: one for a win for white and zero for black. I didn’t include any draw positions. To avoid bias, I have picked even numbers of positions labeled with wins for both sides and even number of positions for both sides with the next turn.
Each position I represented by a vector with the length of 773 elements. Every piece on every square of a chess board, together with castling rights and a next turn, I coded with ones and zeros. My sequential model has an input layer with 773 neurons and an output layer with one single neuron. I have used a three hidden layer deep MLP with 1546, 500 and 50 hidden units for layers 1, 2, and 3 respectively with dropout regularization value of 20% on each. Hidden layers are connected with the non- linear activation function ReLU, while the final output layer has a sigmoid output. I used binary crossentropy loss function and the Adam algorithm with all default parameters, except for the learning rate, which I set to 0.0001.
I used 3 percent of the positions for validation. During the first 10 epochs, validation accuracy gradually went up from 90 to 92%, just one percent behind training accuracy. Further training led to overfitting, with training accuracy going up, and validation accuracy going down.
I tested the trained model on multiple positions by hand, and got pretty bad results. Overall the model can predict which side is winning, if that side has more pieces or pawns close to a conversion square. Also it gives the side with a next turn a small advantage (0.1). But overall it doesn’t make much sense. In most cases it heavily favors black (by ~0.3) and doesn’t properly take into account the setup. For instance, it labels the starting position as ~0.0001, as if the black side has almost 100% chance to win. Sometimes irrelevant transformation of a position results in unpredictable change of the evaluation. One king and one queen from each side usually is viewed as lost position for white (0.32), unless black king is on certain square, even though it doesn’t really change the balance on the chessboard.
What I did to debug the program:
To make sure I have not made any mistakes, I analyzed, how each position is being recorded, step by step. Then I picked a dozen of positions from the final numpy array, right before training, and converted it back to analyze them on a regular chess board.
I used various numbers of positions from the same game (1 and 6) to make sure, that using too many similar positions is not the cause for the fast overfitting. By the way, even one position for each game in my database resulted in 3 million data set, which should be sufficient according to some research papers.
To make sure that the positions I use are not too simple, I analyzed them. 1.3 million of them had 36 points in pieces (knights, bishops, rooks, and queens; pawns were not included in the count), 1.4 million - 19 points, and only 0.3 million - had less.
Some things you could try:
Add unit tests and asserts wherever possible. E.g. if you know that some value is never supposed to get negative, add an assert to check that this condition really holds.
Print shapes of all tensors to check that you have really created the architecture you intended.
Check if your model outperforms some simple baseline model.
You say your model overfits, so maybe simplify it / add regularization?
Check how your model performs on the simplest positions. E.g. can it recognize a checkmate?

k-medoids How are new centroids picked?

My understanding of K-medoids is that centroids are picked randomly from existing points. Clusters are calculated by dividing remaining points to the nearest centroid. Error is calculated (absolute distance).
a) How are new centroids picked? From examples seams that they are picked randomly? And error is calculated again to see if those new centroids are better or worse.
b) How do you know that you need to stop picking new centroids?
It's worth to read the wikipedia page of the k-medoid algorithm. You are right about that the k medoid from the n data points selected randomly at the first step.
The new medoids are picked by swapping every medoid m and every non-medoid o in a loop and calculating the distance again. If the cost increased you undo the swap.
The algorithm stops if there is no swap for a full iteration.
The process for choosing the initial medoids is fairly complicated.. many people seem to just use random initial centers instead.
After this k medoids always considers every possible change of replacing one of the medoids with one non-medoid. The best such change is then applied, if it improves the result. If no further improvements are possible, the algorithm stops.
Don't rely on vague descriptions. Read the original publications.
Before answering a brief about k-medoids would be needed which i have stated in the first two steps and the last two would answer your questions.
1) The first step of k-medoids is that k-centroids/medoids are randomly picked from your dataset. Suppose your dataset contains 'n' points so these k- medoids would be chosen from these 'n' points. Now you can choose them randomly or you could use approaches like smart initialization that is used in k-means++.
2) The second step is the assignment step wherein you take each point in your dataset and find its distance from these k-medoids and find the one that is minimum and add this datapoint to set S_j corresponding to C_j centroid (as we have k-centroids C_1,C_2,....,C_k).
3) The third step of the algorithm is updation step.This will answer your question regarding how new centroids are picked after they have been initialized. I will explain updation step with an example to make it more clear.
Suppose you have ten points in your dataset which are
(x_1,x_2,x_3,x_4,x_5,x_6,x_7,x_8,x_9,x_10). Now suppose our problem is 2-cluster one so we firstly choose 2-centroids/medoids randomly from these ten points and lets say those 2-medoids are (x_2,x_5). The assignment step will remain same. Now in updation, you will choose those points which are not medoids (points apart from x_2,x_5) and again repeat the assigment and update step to find the loss which is the square of the distance between x_i's from medoids. Now you will compare the loss found using medoid x_2 and the loss found by non-medoid point. If the loss is reduced then you will swap x_2 point with any non-medoid point that has reduced the loss.If the loss is not reduced then you will keep x_2 as your medoid and won't swap.
So, there can be lot of swaps in the updation step which also makes this algorithm computationally high.
4) The last step will answer your second question i.e. when should one stop picking new centroids. When you compare the loss of medoid/centroid point with a loss computed by non-medoid, If the difference is very negligible, the you can stop and keep the medoid point as a centroid only.But if the loss is quite significant then you will have to perform the swapping until the loss reduces.
I Hope that would answer your questions.

Training neural network on a sequence of moves

I've been playing with neural networks just out of personal curiosity and wanting to learn. So far, it's been a success. I've written a code for one from scratch, with backpropagation for training, and trained it to play tic-tac-toe (outputs 3x3 matrix with highest value in an open spot being played). It's based off of this example except designed to allow multiple hidden layers.
The training data is a randomly generated mid-game situation and to supervise the learning, I wrote by hand an algorithm for ranking the possible moves to use as the "correct" answers. After training it a couple thousand times, it works pretty well at making the best move and can play a whole game pretty well (favors opening in center, while perfect game you take corner, but whatever).
Anyways, that's all well and good, but for the training, it required me to create a specific algorithm to rank each move for any given play, which was easy enough cause tic-tac-toe is very simple, but this isn't practical. My next milestone, would be to be able to train it just based on winning or losing the game. However, this requires it to remember a sequence of moves, then backpropagate the training not just through the neurons but a sequence of moves after the game is finished. I just have no idea where to start, even being pointed in the right direction would help.

Algorithm for variability analysis

I work with a lot of histograms. In particular, these histograms are of basecalls along segments on the human genome.
Each point along the x-axis is one of the four nitrogenous bases(A,C,T,G) that compose DNA and the y-axis represents how many times a base was able to be "called" (or recognized by a sequencer machine, so as to sequence the genome, which is simply determining the identity of each base along the genome).
Many of these histograms display roughly linear dropoffs (when the machines aren't able to get sufficient read depth) that fall to 0 or (almost-0) from plateau-like regions. When the score drops to zero, it means the sequencer isn't able to determine the identity of the base. If you've seen the double helix before, it means the sequencer can't figure out the identify of one half of a rung of the helix. Certain regions of the genome are more difficult to characterize than others. Bases (or x data points) with high numbers of basecalls, on the order of >=100, are able to be definitively identified. For example, if there were a total of 250 calls for one base, and we had 248 T's called, 1 G called, and 1 A called, we would call that a T. Regions with 0 basecalls are of concern because then we've got to infer from neighboring regions what the identity of the low-read region could be. Is there a straightforward algorithm for assigning these plots a score that reflects this tendency? See box.net/shared/nbygq2x03u for an example histo.
You could just use the count of base numbers where read depth was 0... The slope of that line could also be a useful indicator (steep negative slope = drop from plateau).

Resources