Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I would like to understand what the best way is to conduct further analysis on a trained TensorFlow neural network for regression.
Specifically, I am looking on how to find further maxima/minima from a trained neural network (equivalent to finding max/min from a regression curve). The easy way is to obviously "try out" all possible combinations and check the result set for a max/min, but testing all combinations can quickly become a huge resource sink when having multiple inputs and dependent variables.
Is there any way to use a trained TensorFlow neural network to conduct these further analyses?
As networks are trained incrementally, you can find the maximum incrementally.
Suppose you have a neural network with an input size of 100 (e.g. a 10x10 image) and a scalar output of size 1 (e.g. the score of the image for a given task).
You can incrementally modify the input, starting from random noise, until you obtain a local maximum of the output. All you need is the gradients of the output with respect to the input:
input = tf.Variable(tf.truncated_normal([100], mean=127.5, stddev=127.5/2.))
output = model(input)
grads = tf.gradients(output, input)
learning_rate = 0.1
update_op = input.assign_add(learning_rate * grads)
ANNs is not something which can be checked analytically. It has sometimes millions of weights and thousands of neurons, non-linear activation functions of different types, convolution and max-pooling layers.. No way you analytically determine anything about it. Actually that's why networks are trained incrementally.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm working on a deep learning classifier (Keras and Python) that classifies time series into three categories. The loss function that I'm using is the standard categorical cross-entropy. In addition to this, I also have an attention map which is being learnt within the same model.
I would like this attention map to be as small as possible, so I'm using a regularizer. Here comes the problem: how do I set the right regularization parameter? What I want is the network to reach its maximum classification accuracy first, and then starts minimising the intensity attention map. For this reason, I train my model once without regulariser and a second time with the regulariser on. However, if the regulariser parameter (lambda) is too high, the network loses completely accuracy and only minimises the attention, while if the regulariser is too small, the network only cares about the classification error and won't minimise the attention, even when the accuracy is already the maximum.
Is there a smarter way to combine the categorical cross-entropy with the regulariser? Maybe something that considers the variation of categorical cross-entropy in time, and if it doesn't go down for, say N iterations, it only considers the regulariser?
Thank you
Regularisation is a way to fight with overfitting. So, you should understood if your model overfits. A simple way to do it: you can compare f1 score for train and test. If f1 score for train is high and for test is low, seems, you have overfitting - so you need add some regularisation.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
We are using facenet and have generated embeddings (128 features) for faces https://github.com/davidsandberg/facenet. We have 100k classes (celebrities) from MSCeleb http://www.msceleb.org/ and 8M samples.
How does one construct a neural network that can map the 128 features to 100k classes?
Using a fully connected layer would result in (128 + 1)*100k = 12.9 million parameters which seems too large to train.
From the FaceNet abstract:
In this paper we present a system, called FaceNet, that directly
learns a mapping from face images to a compact Euclidean space where
distances directly correspond to a measure of face similarity. Once
this space has been produced, tasks such as face recognition,
verification and clustering can be easily implemented using standard
techniques with FaceNet embeddings as feature vectors.
Instead of training a classifier, consider doing a nearest neighbor search in the feature space. You can select anchor images for each of your 100k celebrities and then build ak-d tree from their feature vectors. Then for each input you can find its nearest neighbor in the k-d tree.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
In deep learning we should choose the best model according to the train/val loss and accuracy, but how do I know which point is the best?
Does it only depend on the val accuracy regardless of the other metrics?
And two more relevant questions:
How do the optimal train/val loss and accuracy curves look like?
What should I do if the train loss is decreasing and train accuracy is increasing, but the val loss is increasing while val accuracy stops increasing after training a long time?
It looks like this:
train accuracy
train loss
val accuracy
val loss
First in first, you need to choose model according to the result on development/validation dataset. Therefore, val accuracy and val loss are used to judge the model's performance.
To some extent, higher val accuracy are often associated with lower val loss. That's because your loss is used to measure the difference between the predicted result and the ground-truth.
Different problems measured by different metrics, just like we often use BLEU score in machine translation, you need to read some papers about your research field to get which metric is popular.
Train loss decrease and val loss increase is quite a normal apperance in model training, it usually means your model is over-fitting. It learns too much features only appeared in training dataset but not the whole dataset.
As for dealing with over-fitting, there are many methods like early-stopping, drop layers, etc. You can just google it.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am new to Machine Learning. Can anyone tell me the major difference between classification and regression in machine learning?
Regression aims to predict a continuous output value. For example, say that you are trying to predict the revenue of a certain brand as a function of many input parameters. A regression model would literally be a function which can output potentially any revenue number based on certain inputs. It could even output revenue numbers which never appeared anywhere in your training set.
Classification aims to predict which class (a discrete integer or categorical label) the input corresponds to. e.g. let us say that you had divided the sales into Low and High sales, and you were trying to build a model which could predict Low or High sales (binary/two-class classication). The inputs might even be the same as before, but the output would be different. In the case of classification, your model would output either "Low" or "High," and in theory every input would generate only one of these two responses.
(This answer is true for any machine learning method; my personal experience has been with random forests and decision trees).
Regression - the output variable takes continuous values.
Example :Given a picture of a person, we have to predict their age on the basis of the given picture
Classification - the output variable takes class labels.
Example: Given a patient with a tumor, we have to predict whether the tumor is malignant or benign.
I am a beginner in Machine Learning field but as far as i know, regression is for "continuous values" and classification is for "discrete values". With regression, there is a line for your continuous value and you can see that your model is good or bad fit. On the other hand, you can see how discrete values gains some meaning "discretely" with classification. If i am wrong please feel free to make correction.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have a question regarding data preprocessing for machine learning. Specifically transforming the data so it has zero mean and unit variance.
I have split my data into two datasets (I know I should have three, but for the sake of simplicity let's just say I have two). Should I transform my training data set so that the entire training data set has unit variance and zero mean and then when testing the model transform each test input vector so that each particular test input vector presents unit variance and zero mean, or I should just transform the entire dataset (traning and testing) together so that the whole thing presents unit var and zero mean? My belief is that I should do the former that way I won't be introducing a despicable amount of bias into the test data set. But I am no expert, thus my question.
Fitting your preprocessor should only be done on the training-set and the mean and variance transformers are then used on the test-set. Computing these statistics on train and test leaks some information about the test-set.
Let me link you to a good course on Deep-Learning and show you a citation (both from Andrej Karpathy):
Common pitfall. An important point to make about the preprocessing is that any preprocessing statistics (e.g. the data mean) must only be computed on the training data, and then applied to the validation / test data. E.g. computing the mean and subtracting it from every image across the entire dataset and then splitting the data into train/val/test splits would be a mistake. Instead, the mean must be computed only over the training data and then subtracted equally from all splits (train/val/test).