Get the training data from the trained model's pickle file - machine-learning

Can I get the training data used for training a model in sklearn from the pickle file of the trained model?

I don't think we can retrieve the training data used for developing a model from the pickle file of the trained model.
Trained model is only a representation of relation of the dependent variable to the independent variables used for model which will help us in predicting the dependent variable value for any new record of independent variable that comes in.
Hope this helps.

Related

How to extract weights from trained tensorflow object detection api model

I am using the Tensorflow Object Detection API to train a couple of models (with SSD and Faster RCNN) in a custom dataset. Everything works well, but I want to know how to extract the convolutional and classification model weights, in order to load those weights in an external (for instance keras) convolutional and full connected corresponding model. I've read about the meta architectures (SSDMetaArch and FasterRCNNMetaArch) and restoring checkpoint, but I am not sure yet how to do it for my purpose.
The above because I want to use something like CAM or GradCAM to visually check what the model learns for every class in my dataset.
Thank you

Is there a way to finetune yolo_v4 with transfer learning toolkit v3.0?

I am quite new to nvidia-tlt. Currently, I have trained, pruned and retrained the model with the kitti dataset, also am able to do these steps on any datasets with the required kitti format. What I want to do is used a previously trained model on kitti and fine tune it to my own data. The config file have the options pretrained_model_path, resume_model_path and pruned_model_path, So there is no option for the fine-tune in config. If I try to use pretrained_model_path, it throws an exception for the shape.
Invalid argument: Incompatible shapes: [6,29484,3] vs. [6,29484,12]
That error is expected.
Technically the pretrained model that we download from ngc comes without final layer which represents the total number of classes and their respective bboxes.
Once you train that model with any dataset, then the trained model will be frozen with the top layer. Now, if you want to finetune the same model with different number of classes you will get error related to invalid shapes.
You need to train the model on the new dataset from the beginning.
If you want to finetune the model with different dataset but of the same classes then you can use the previously trained model.

Do I re-train the model on whole training data

I have an image dataset for multi-class image classification- training & testing images. I trained and saved my model (as .h5 file) on training data, using 80-20% as train-validation split.
Now, I want to predict the classes for test images.
Which option is better and is it always the case?
Use the trained model as it is for "test images" prediction.
Train the saved model on whole training data (i.e, including 20% of the validation images) and then do predictions on test images. But in case, there will be no validation data, and hence, how does the model ensure that it keeps the loss to be minimum during training.
If you already properly trained the model, you do not need to retrain again. (Unless you are doing something specific with transfer learning). The whole purpose of having test data is to use as a test case to see how well you model did on unseen data.

Re-train loaded model from pickle file

I have three datasets: train, validation, test and I am currently using an XGBoost Classifier to do the job on a classification task.
I trained the XGBClassifier on the train set and saved it as a pickle file to avoid having to re-train it every time. Once I load the model from the pickle file, I am able to use the predict method from it, but I don't seem to be able to train this model on the validation set or any other new dataset.
Note: I do not get any error output, the jupyter lab cell looks like it's working perfectly, but my CPU cores are all resting during this cell's operation, so I see the model isn't being fitted.
Could this be a problem with XGBoost or pickle dumped models are not able to be fitted again after loading?
I had the exact same question a year ago, You can find here the question and answer
Though, in this way, you will keep adding "trees" (boosters) to your existing model, using your new data.
It might be better to train a new model on your training + validation data sets.
Whatever you decide to do, you should try both options and evaluate your results to see what fits better for your data.

Classifying instances of a set with a Classification Model on WEKA GUI

I am new to data mining and I would like to ask you a classification question.
I have trained a classification algorithm on WEKA (GUI), using a training set in ARFF format. Consequently I saved it in Model format for future use.
Now I want to use this classification Model on WEKA (GUI) to get the predicted class of instances of a set that is also in ARFF format. Could you please give me instructions on how to do this on WEKA? Unlike the Weka Java API, the GUI version has really poor documentation in the Web and I couldn't find anything relevant.
Is it possible to store the classified set back to ARFF format with '?''s replaced with the class label in the class attribute? I need such outputs files for some computations.
Thank you beforehand.

Resources