Training padding on a predictive model - machine-learning

Currently I have a model trying to predict how long it takes for a couple of restaurants to make my order. After training it, I have a model which has roughly 60% values which are under the actual value, and 40% which are over. I wanted to train another model on top of this as a "padding" model so that if I add the results of the padding model to this model, I can get around 90% early.
What is the best way to go about creating this padding model? I want to optimize it such that there is the least amount of padding possible, yet maintain a constraint of 90% of the end result are early (predicted value is > actual value)
Thanks!

Related

Deeplabv3+ model acts marginalized

I can train the deeplabv3+ model on my dataset and it gives good results.
But my problem with it is that the model infers pixels with very high probability.
The model treats black (does not belong to a class) and white (definitely belongs to a class) with pixels.
For example, when it wants to assign a probability value to a pixel that it is sure pixel belongs to a person, it assigns it 0.99, but when it encounters a pixel, say a blurred pixel from a person's hand and it is not sure of the class of that pixel it assigns 0.04 probability of being human or lower values ​​for that pixel where my expectation is assigning a value around 0.4 or 0.6.
It is important for me to get such values.
I've tried weighting the critical parts of my dataset (like hands, ..) to make it harder for the model to learn shapes, but the problem persists.
I know that my model is not over−fitted as the official definition (because the mIOU of my model on the test dataset is low as expected) but it works marginalized as describes earlier.
Any ideas would be appreciated.

How to deal this task with machine learning?

Question
I'm trying to deal following task with machine learning, but the performance is not so good. Actually I'm not familiar with around machine learning and data science, so I don't have much acknowledgement. Don't you know there was some similar tasks in the past at like Kaggle?
Task
The dataset is several queries and list of contents respect to each queries.
Contents in each queries have 0 or 1 as a label.
Almost all contents in each queries has 0 as a label.
There is only 1 or 0 content which has 1 as a label in each queries.
I wanna give the highest output from the model to the content which has 1 as a label in each queries.
I don't care about the order, difference of output from the model among contents with label 0, I just wanna bring content with label 1 at No.1 in each queries.
When the model gives exactly the same output to the all of contents, the content with label 1 will be No.1 of the ranking in the query, duh. But this doesn't have no meaning.
What I did
At first, I didn't look the dataset by each queries, and I treated as a classification task to classify into 0 or 1. Let's say the model could classify a content with label 1 as 1, but sometimes there was content with label 0 classified as 1 with higher score than the content whose label is 1 in the same query. Actually the order (content whose label is 1 comes No.1 in each queries) is the most important thing, so I'm using "Learn to Rank" now.
Problems I'm facing
I'm visiting many website which describes on Learn to Rank, but I can't find the case like this, I don't know how can I call, like binary ranking.
Actually I'm using "LambdaRank" method, which scale the differential of loss function(cross entropy) because I'm expecting this method will contribute to bring the content with label 1 to the top of the list in each queries. And I'm using LightGBM or PyTorchBut now I'm facing several problems like this.
Because almost all contents has 0 as their label, so the model can make loss small with predicting all of them are 0. Then, the slope of loss function is almost 0, so the training will not progress. Then, all of the contents is No.1 in the ranking.
(In PyTorch,) training is depends on the beginning of the training. In many cases, the model will predict all of the content is 0 at the 1st epoch of training. Then, the training will not progress as I said before. But, I'm not sure the reason, but sometimes, there is the content with label 1 at the top of ranking with about 10% of queries. In this case, the training will progress.
(Confirming with PyTorch is not yet) After training, about 80% of contents with label 1 is No.1 in their queries. But there is several queries who has no content with label 1, only has contents with label 0. Actually I wanna cut such queries, so I did score-cut but it was not effective. So, I guess there is no consistency of prediction among queries. Let's say there is 2 queries and predictions on each contents is like, query1[A:0.9, B:0.6, C:0.1] query2[D:0.7, E:0.2], then A is more relevant than D respect to each query?
Ideas but I've not tried yet
Training will not progress due to a lot of contents with label 0.
Use any other loss function like focal loss.
Use the differential of loss function at the prediction on content with label 1 for updating model parameters.
To reduce contents with label 0 at the No.1 in the ranking even if there is also the content with label 1 at the No.1,
Create custom metric which reduces them.
Create custom metric which compares the prediction of the content with label 1 and the content with the highest score in the list except for the content with label 1.
But I guess these metrics are not differentiable, but I think I can use for scaling differential of loss function instead of NDCG in LambdaRank method.

A more imbalanced approach to compute_class_weight

I have a large multi-label array with numbers between 0 and 65. I'm using the following code to generate class weights:
class_weights = class_weight.compute_class_weight('balanced',np.unique(labels),labels)
Where as the labels array is the array containing numbers between 0 and 65.
I'm using this in order to fit a model with class_weight flag, the reason is because I have many examples of "0" and "1" but a low amount of > 1 examples, I wanted the model to give more weight towards the examples with the less counts. This helped alot, however, now, I can see that the model gives too much weight towards the less examples and neglected a bit the examples of highest counts (1 and 0). I'm trying to find a middle approach to this, would love some tips on how to keep going on.
This is something you can achieve in in two ways provided you have done the weight assignment correctly that is giving more weights to less occurring labels and vice versa presumably which you have already done.
Reduce the number of highly occurring labels in your case 0 and 1 to a label with other labels provided it does not diminishes your dataset to big margin. However this can be more often not feasible when other less occurring labels are significantly very less and is something you can decide on
Other and most plausible solution would be either oversample the less occurring labels by creating its copies or under sampling the most occurring labels

What is wrong with my approach of using MLP to make a chess engine?

I’m making a chess engine using machine learning, and I’m experiencing problems debugging it. I need help figuring out what is wrong with my program, and I would appreciate any help.
I made my research and borrowed ideas from multiple successful projects. The idea is to use reinforcement learning to teach NN to differentiate between strong and weak positions.
I collected 3 million games with Elo over 2000 and used my own method to label them. After researching hundreds of games, I found out, that it’s safe to assume that in the last 10 turns of any game, the balance doesn’t change, and the winning side has a strong advantage. So I picked positions from the last 10 turns and made two labels: one for a win for white and zero for black. I didn’t include any draw positions. To avoid bias, I have picked even numbers of positions labeled with wins for both sides and even number of positions for both sides with the next turn.
Each position I represented by a vector with the length of 773 elements. Every piece on every square of a chess board, together with castling rights and a next turn, I coded with ones and zeros. My sequential model has an input layer with 773 neurons and an output layer with one single neuron. I have used a three hidden layer deep MLP with 1546, 500 and 50 hidden units for layers 1, 2, and 3 respectively with dropout regularization value of 20% on each. Hidden layers are connected with the non- linear activation function ReLU, while the final output layer has a sigmoid output. I used binary crossentropy loss function and the Adam algorithm with all default parameters, except for the learning rate, which I set to 0.0001.
I used 3 percent of the positions for validation. During the first 10 epochs, validation accuracy gradually went up from 90 to 92%, just one percent behind training accuracy. Further training led to overfitting, with training accuracy going up, and validation accuracy going down.
I tested the trained model on multiple positions by hand, and got pretty bad results. Overall the model can predict which side is winning, if that side has more pieces or pawns close to a conversion square. Also it gives the side with a next turn a small advantage (0.1). But overall it doesn’t make much sense. In most cases it heavily favors black (by ~0.3) and doesn’t properly take into account the setup. For instance, it labels the starting position as ~0.0001, as if the black side has almost 100% chance to win. Sometimes irrelevant transformation of a position results in unpredictable change of the evaluation. One king and one queen from each side usually is viewed as lost position for white (0.32), unless black king is on certain square, even though it doesn’t really change the balance on the chessboard.
What I did to debug the program:
To make sure I have not made any mistakes, I analyzed, how each position is being recorded, step by step. Then I picked a dozen of positions from the final numpy array, right before training, and converted it back to analyze them on a regular chess board.
I used various numbers of positions from the same game (1 and 6) to make sure, that using too many similar positions is not the cause for the fast overfitting. By the way, even one position for each game in my database resulted in 3 million data set, which should be sufficient according to some research papers.
To make sure that the positions I use are not too simple, I analyzed them. 1.3 million of them had 36 points in pieces (knights, bishops, rooks, and queens; pawns were not included in the count), 1.4 million - 19 points, and only 0.3 million - had less.
Some things you could try:
Add unit tests and asserts wherever possible. E.g. if you know that some value is never supposed to get negative, add an assert to check that this condition really holds.
Print shapes of all tensors to check that you have really created the architecture you intended.
Check if your model outperforms some simple baseline model.
You say your model overfits, so maybe simplify it / add regularization?
Check how your model performs on the simplest positions. E.g. can it recognize a checkmate?

Class labels in data partitions

Suppose that one partitions the data to training/validation/test sets for further application of some classification algorithm, and it happens that training set doesn't contain all class labels that were present in the complete dataset - say some records with label "x" appear only in validation set and not in the training.
Is this the valid partitioning? The above can have many consequences like confusion matrix would be no longer square, also during the algorithm we may evaluate an error and this would be affected by unseen labels in training set.
The second question is following: is it common for partitioning algorithms to take care about above issue and partition the data in the way that training set has all existing labels?
This is what stratified sampling is supposed to solve.
https://en.wikipedia.org/wiki/Stratified_sampling

Resources