I just started learning machine learning and am facing a problem with regards to creating my logistic regression model. My input, X is a binary string (ie it's a string with '1' and '0's), 2048 characters long. len(X) is 165. I need it as a string of 1 and 0 because it's an important input feature.
I got an error message "Input contains NaN, infinity or a value too large for dtype('float64')" and did not know how to resolve the error.
I have already eliminated the NaN values before assigning the values to X. The sum of NaN values are 0.
If anyone has any suggestions on what I can try to solve the error, do let me know. Thank you so much!!
can you post a screenshot of the data, cant actually understand what you mean by change the input ti binary, seeing the data will help, and please try and format your question well
Related
i tried to normalize my data sets column with this code , but the results on the column in (daddr)was not in 0 , 1 range enter image description here
and also the results in loss apear like following enter image description here
this is the code i used enter image description here
please tell me what is the missing thing to solve the (loss ) problem , how i could do the MinMax Normalization on all data sets column , is the problem overfitting or what ?
Normalizing the data is not always necessary. It depends with the model you use. Most of the time Normalizing is necessary when working with sigmoid or tanh function in your model. Do you really need to normalize the data ? Try without Normalizing.
I have unbelievably stupid problem. Calculating precision and recall by sci-kit learn gives me crazy values, totally different than calculated by me, using confusion matrix.
Here's my code:
I tries also average 'weighted' and 'macro', and separated functions f_score, precision_score and recall_score. Nothing helped.
I got these results:
Firstly there is y_test values, then y_pred (as you can see, there is only one true positive prediction) then recall and precision calculated out of confusion matrix results (precision 0.14 is something I did expected). In the end there are precision and recall calculated by sklearn function and... I don't understand! Why the difference?!
Does anyone have idea why these results look like this?
Yeah, that was veeery stupid problem. The solution was changing average='micro' to 'binary'. Then the results are correct.
I am using the Learning from Data textbook by Yaser Abu-Mostafa et al. I am curious about the following statement in the linear regression chapter and would like to verify that my understanding is correct.
After talking about the "pseudo-inverse" way to get the "best weights" (best for minimizing squared error), i.e w_lin = (X^T X)^-1 X^T y
The statement is "The linear regression weight vector is an attempt to map the inputs X to the outputs y. However, w_lin does not produce y exactly, but produces an estimate X w_lin which differs from y due to in sample error.
If the data is on a hyper-plane, won't X w_lin exactly match y (i.e in-sample error = 0)? I.e above statement is only talking about data that is not linearly separable.
Here, 'w_lin' is the not the same for all data points (all pairs of (X,y)).
The linear regression model finds the best possible weight vector (or best possible 'w_lin') considering all data points such that X*w_lin gives a result very close to 'y' for any data point.
Hence the error will not be zero unless all data points line on a straight line.
The community might not get whole context unless the book is opened because not everything that the author of the book says might have been covered in your post. But let me try to answer.
Whenever any model is formed, there are certain constants used whose value is not known beforehand but are used to fit the line/curve as good as possible. Also, the equations, many a times, contain an element of randomness. Variables that take random values cause some errors when actual and expected outputs are computed.
Suggested reading: Errors and residuals
Running catboost on a large-ish dataset (~1M rows, 500 columns), I get:
Training has stopped (degenerate solution on iteration 0, probably too small l2-regularization, try to increase it).
How do I guess what the l2 regularization value should be? Is it related to the mean values of y, number of variables, tree depth?
Thanks!
I don't think you will find an exact answer to your question because each data-set is different one from another.
However, based on my experience values form a range between 2 and 30, is a good starting point.
Update: This question is outdated and was asked for a pre 1.0 version of tensorflow. Do not refer to answers or suggest new ones.
I'm using the tf.nn.sigmoid_cross_entropy_with_logits function for the loss and it's going to NaN.
I'm already using gradient clipping, one place where tensor division is performed, I've added an epsilon to prevent division by zero, and the arguments to all softmax functions have an epsilon added to them as well.
Yet, I'm getting NaN's mid-way through training.
Are there any known issues where TensorFlow does this that I have missed?
It's quite frustrating because the loss is randomly going to NaN during training and ruining everything.
Also, how could I go about detecting if the training step will result in NaN and maybe skip that example altogether? Any suggestions?
EDIT: The network is a Neural Turing Machine.
EDIT 2: Here's the code for gradient clipping:
optimizer = tf.train.AdamOptimizer(self.lr)
gvs = optimizer.compute_gradients(loss)
capped_gvs =\
[(tf.clip_by_value(grad, -1.0, 1.0), var) if grad != None else (grad, var) for grad, var in gvs]
train_step = optimizer.apply_gradients(capped_gvs)
I had to add the if grad != None condition because I was getting an error without it. Could the problem be here?
Potential Solution: I'm using tf.contrib.losses.sigmoid_cross_entropy for a while now, and so far the loss hasn't diverged. I will test some more and report back.
Use 1e-4 for the learning rate. That one always seems to work for me with the Adam optimizer. Even if you gradient clip it can still diverge. Also another sneaky one is taking a square root since although it will be stable for all positive inputs its gradient diverges as the value approaches zero. Finally I would check and make sure all inputs to the model are reasonable.
I know it has been a while since this was asked, but I'd like to add another solution that helped me, on top of clipping. I found that, if I increase the batch size, the loss tends to not go close to 0, and doesn't end up (as of yet) going to NaN. Hope this helps anyone that finds this!
In my case, the NaN values were a result of NaN in the training datasets , while I was working on multiclass classifier , the problem was a dataframe positional filter on the [ one hot encoding ] labels.
Resolving the the target dataset resolved my issue - hope this help someone else.
Best of luck.
for me i added epsilon to parameters inside a log function.
i no longer see the errors but i noticed a moderate increase in the model training accuracy.