I understand that predict_generator outputs probabilities. To get the class, I just then find the index for the greatest probability and that will be the most probable class. However I find that after doing this, I get a different output than if I were to call predict_classes. I do not understand why. Can someone explain this please?
Generator in Keras uses glob to list folders which are alphabetically sorted, you can get classes being used during training using
# save classes to JSON
class_json = json.dumps(train_generator.class_indices)
with open("class.json", "w") as class_file:
class_file.write(class_json)
The samples are shuffled with in the batch generator(here) so that when a batch is requested by the fit_generator or evaluate_generator random samples are given.
Another possibility if this is being done on images is not to use rescale=1./255 in ImageDataGenerator as mentioned in https://github.com/fchollet/keras/issues/3477
Hope that help!
Related
Currently I am using a csv file converted from a pcap and I took the column length from my csv file and used it as my embedding. The code compiles and I do get accuracy in the high 70s like 77 percent. I am just not sure if this is an appropriate choice for an embedding. I am also getting this issue of some data sets get weird results as UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples. I know there is people who answered this question but I tried all of there methods and still no clue why my model works for some data sets and not all.Please if someone could confirm what I am doing makes sense or not that would really help me.
CSV file snapshot for reference
df['embeddings'] =df['Length']
embeddings = torch.from_numpy(df['embeddings'].to_numpy())
# normalizing degree values
scale = StandardScaler()
embeddings = scale.fit_transform(embeddings.reshape(-1,1))
I am a complete beginner in Deep Learning & Keras. I want to build a hierarchical attention network that helps to classify comments into several categories viz. toxic, severely toxic, etc. I took the code from an open repository and saved the model. I then loaded the model using model_from_json. Now I wish to use this loaded model to make predictions on the input text(given as a python input or as a separate file).
This is the code that I am using: https://www.kaggle.com/sermakarevich/hierarchical-attention-network/notebook
Then I did:
model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
model.save_weights("model.h5")
print("Saved model to disk")
Then in a separate file:
json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json,custom_objects={'AttentionWithContext':AttentionWithContext})
loaded_model.load_weights("model.h5")
print("Loaded model from disk")
I am getting "loaded model from disk" perfectly. I wish to know the format in which I need to give input and how and the code snippet to use the model to classify it. Since I do not have much knowledge about it, It would be really helpful if someone could help me with the python specific code to make it work.
While doing prediction please make sure that you pickle tokenizer as well otherwise the output won't be correct.
new = ["Your_text_that_you_want_to_check"]
seq = tokenizer.texts_to_sequences(new)
padded = pad_sequences(seq, maxlen=MAX_SEQUENCE_LENGTH)
pred = model.predict(padded)
While predicting it is very important to convert your new text to the vector such that your model is trained. I've converted my training data to sequence and then pad it with zero so that length should be same and the same steps I repeated while predicting. But make sure you pickle your tokenzier. I hope it helps! Let me know if you're having difficulty understanding the steps.
I'm working on 256_ObjectCategories dataset from Caltech. They have organised all the images in 256 categories in different folders. I'm using ImageDataGenerator from Keras to load the dataset but I'm not able to split it into training and testing using the same. How can I do this in a terminal without moving images or changing directories? Any help is appreciated. Thank you. :)
This doesn´t seem to be possible out of the box with ImageDataGenerator right now. See this thread: https://github.com/fchollet/keras/issues/5862
User AloshkaD suggests as a workaround that you create an index list with glob: rasterList = glob.glob(os.path.join(path_of_your_image_directory, '*.jpg')), split that programmatically and feed the validation part of that list into the validation_data parameter of fit_generator().
Now I'm using fb torch library from github fb torch resnet
It's my first time to use torch and lua, so Im encountering some problems.
My goal is to save the feature vector of specific layer (last avg pooling of resnet) into a one file with the class of the input image. All input images are from cifar-10 db.
The file format that i want to get is like belows
image1.txt := class index of image and feature vector of image 1 of cifar-10
image2.txt := class index of image and feature vector of image 2 of cifar-10
// and so on through all images of cifar-10
Now I have seen some sample code of that github extract-features.lua
Because it's my first time for lua, I feel so hard to understand this code and to modify to the way i want. And i don't want my data to save into t7 file format.
How can i access only one specific layer from network in torch via lua? (last average pooling)
How can i access values of the layer and classification result index?
How can read all each images from cifar-10 db file(t7 batch)?
Sorry for too many questions. But im feeling hard using torch because of pool amouns of community threads and posting of torch.. please understand me.
How can i access only one specific layer from network in torch via lua? (last average pooling)
To access each layer you just have to load the model and get it using an integer number. If you do print model you will be able to see in which position the last average pooling is.
model = torch.load(path_to_model):cuda()
avg_pooling_layer = model:get(position_of_the_avg_pooling_layer)
How can i access values of the layer and classification result index?
I do not quite understand what you mean by this. If you want to see the output or the weights from a specific layer. (following the code above) You need to get these elements from the layer table. Again, to see which ones are the possible elements to get use print avg_pooling_layer
weights = avg_pooling_layer.weight -- get the weights of the layer
output = avg_pooling_layer.output -- get the output of the layer
How can read all each images from cifar-10 db file(t7 batch)?
To read the images from a t7 file use the torch function torch.load. (used before to load the model).
cifar_10 = torch.load("path_to_cifar-10.t7")
Once loaded you could have the training and test set in subtables or functions. Again, print the table and visualize which values are the ones you need to get.
Hope this helps!
I try to model a CNN with deeplearing4j using SVHN dataset (http://ufldl.stanford.edu/housenumbers/), in particular I'm using
Format 2: Cropped Digits
This is matlab's files and each one contains a struct with a tensor (4-D) and an array with label. I would open this one into my deeplearing4j code, so I wondered and I find this class MatlabRecordReader.java into deeplearning4j/DataVec (https://github.com/deeplearning4j/DataVec/blob/master/datavec-api/src/main/java/org/datavec/api/records/reader/impl/misc/MatlabRecordReader.java) but I can't understand how use it. Anybody has experience whit this?
Thanks in advance
Here is a reference for "datavec":
http://deeplearning4j.org/DataVec
So if you look at:
http://nd4j.org/tensor
All of deeplearning4j's neural nets are written using nd4j (matlab for java) so this should be pretty easy to map.
You'll see it more or less maps to matlab.
What might be easier is if you could just write out the values as a csv
and reshape them to be the proper value instead. If you use c ordering it should work fine.
If you do that you can just use the csvrecord reader.
That matlab record reader hasn't been used by a lot of people and I think may only work with matrices (it's been a while)
I would try the csv one first.