How to load validation data set - machine-learning

How do I save the trained and tested Model ? Load the model to validate new dataset i.e.Validation_dataset

Related

Use tested machine learning model on new unlabeled single observation or dataset?

How can I use a trained and tested algorithm (eg. machine learning classifier) after being saved, on a new observation/dataset, whose I do not know the class (eg. ill vs healthy) based on predictors used for model training?
I use caret but can't find any lines of code for this.
many thanks
After training and testing any machine learning model you can save the model as .rds file and call it as
#Save the fitted model as .rds file
saveRDS(model_fit, "model.rds")
my_model <- readRDS("model.rds")
Creating a new observation from the same dataset or you can use a new dataset also
new_obs <- iris[100,] #I am using default iris dataset, 100 no sample
Prediction on the new observation
predicted_new <- predict(my_model, new_obs)
confusionMatrix(reference = new_obs$Species, data = predicted_new)
table(new_obs$Species, predicted_new)

How to extract features from a pytorch pretrained fine-tuned model

I need to extract features from a pretrained (fine-tuned) BERT model.
I fine-tuned a pretrained BERT model in Pytorch using huggingface transformer. All the training/validation is done on a GPU in cloud.
At the end of the training, I save the model and tokenizer like below:
best_model.save_pretrained('./saved_model/')
tokenizer.save_pretrained('./saved_model/')
This creates below files in the saved_model directory:
config.json
added_token.json
special_tokens_map.json
tokenizer_config.json
vocab.txt
pytorch_model.bin
I save the saved_model directory in my computer and load the model and tokenizer like below
model = torch.load('./saved_model/pytorch_model.bin',map_location=torch.device('cpu'))
tokenizer = BertTokenizer.from_pretrained('./saved_model/')
Now to extract features, I do below
input_ids = torch.tensor([tokenizer.encode("Here is some text to encode", add_special_tokens=True)])
last_hidden_states = model(input_ids)[0][0]
But for the last line, it throws me error TypeError: 'collections.OrderedDict' object is not callable
It seems like I am not loading the model properly. Instead of loading the entire model in itself, I think my model=torch.load(....) line is loading a ordered dictionary.
What am I missing here? Am I even saving the model in the right way? Please suggest.
torch.load() returns a collections.OrderedDict object. Checkout the recommended way of saving and loading a model's state dict.
Save:
torch.save(model.state_dict(), PATH)
Load:
model = TheModelClass(*args, **kwargs)
model.load_state_dict(torch.load(PATH))
model.eval()
So, in your case, it should be:
model = BertModel(config)
model.load_state_dict('./saved_model/pytorch_model.bin',
map_location=torch.device('cpu'))
model.eval() # to disable dropouts

How to load the saved tokenizer from pretrained model

I fine-tuned a pretrained BERT model in Pytorch using huggingface transformer. All the training/validation is done on a GPU in cloud.
At the end of the training, I save the model and tokenizer like below:
best_model.save_pretrained('./saved_model/')
tokenizer.save_pretrained('./saved_model/')
This creates below files in the saved_model directory:
config.json
added_token.json
special_tokens_map.json
tokenizer_config.json
vocab.txt
pytorch_model.bin
Now, I download the saved_model directory in my computer and want to load the model and tokenizer. I can load the model like below
model = torch.load('./saved_model/pytorch_model.bin',map_location=torch.device('cpu'))
But how do I load the tokenizer? I am new to pytorch and not sure because there are multiple files. Probably I am not saving the model in the right way?
If you look at the syntax, it is the directory of the pre-trained model that you are supposed to pass. Hence, the correct way to load tokenizer must be:
tokenizer = BertTokenizer.from_pretrained(<Path to the directory containing pretrained model/tokenizer>)
In your case:
tokenizer = BertTokenizer.from_pretrained('./saved_model/')
./saved_model here is the directory where you'll be saving your pretrained model and tokenizer.

I applied an inception model and my model has been savde but how do I avoid training the dataset again and agian?

I have same my inception model in Pycharm using TensorFlow library. Every time I run the project, it starts training the Data set. I want to skip the training every time I run model because once the model has been save ,there is no need to train the data again and again. How I get to know my model has been save successfully? How can I apply the save model in same file?
You can save/restore/load your model using TensorFlow:
Save:
builder = tf.saved_model.builder.SavedModelBuilder(export_dir) with tf.Session(graph=tf.Graph()) as sess: ... builder.add_meta_graph_and_variables(sess,
[tag_constants.TRAINING],
signature_def_map=foo_signatures,
assets_collection=foo_assets,
strip_default_attrs=True)
...
builder.save()
Load:
with tf.Session(graph=tf.Graph()) as sess:
tf.saved_model.loader.load(sess, [tag_constants.TRAINING], export_dir)
...
For further reference: TensorFlow Guide on Saving a Model
Actually, once you have saved your model, some files will be saved to your directory with the extension .YAML, .h5 or .meta(for graph), you can check the accuracy of model by restoring from saved file, just for sanity check.
There is nice tutorial on this:
https://www.tensorflow.org/guide/saved_model
http://cv-tricks.com/tensorflow-tutorial/save-restore-tensorflow-models-quick-complete-tutorial/
If you are use keras-api to build model, then this link will be useful for saving and restoring https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model

Saving lstm language model in Torch

I am using the lstm language model in https://github.com/wojzaremba/lstm/blob/master/main.lua
I want to save the model at the end of training for later use. I added the following line at the end of training
torch.save(params.model_file, model)
Which seems to successfully save the model. However, when I try to load that model and test it, I get a very large perplexity. Just for testing, I ran a small training instance, which resulted in a test set perplexity of 134, then saved the model. I then loaded the saved model and applied exactly the same testing method (function run_test) on the same test set, but I got a huge perplexity of 71675.134 (even using random weights gives much lower perplexity than that!). I tried saving and loading only the weights, converting them to float() before saving, or saving them as cudaTensors, and all gave me the same result.
Here is the code for loading and testing after saving the whole model; I only modified the main method from the original main.lua:
local function main()
g_init_gpu(arg)
print('loading model from file ' .. params.model_file)
model=torch.load(params.model_file)
state_test = {data=transfer_data(ptb.testdataset(params.batch_size))}
reset_state(state_test)
run_test()
end

Resources