How to extract feature from raw video file like MELD dataset? - machine-learning

I've trained a model using MELD dataset. Now I need to preprocess the raw video file from users, I don't know how to do that.
I tried to extract the mfcc from the video MELD dataset provided. But it's not correct.

Related

How to use model generated with pytorch transfer learning module (.pth) in opencv's neural network module (caffe)?

I have used transfer learning of pytorch to trained a model. It has an extension of .pth. I want to use it to recognize object in video. I have always been using opencv in video processing so I want to use it this time too. However, the dnn module from opencv does not accept model with .pth extension. Can I generate model with pytorch that can be accepted by opencv? Or can I use model with .pth in opencv?
Here's how I saved the trained model
torch.save(the_model.state_dict(), PATH)
I have read that post How should I save the model of PyTorch if I want it loadable by OpenCV dnn module but it is not helpful
You could always try exporting using onnx, which looks like it could be read by OpenCV

Classifying image documents using Machine learning

I want to classify image documents(like Passport, Driving Licence etc) using Machine Learning.
Does anybody has any link or documents where I can get idea to do this task.
What I am thinking is of first converting the document to text format and then fro Text file extract the information.But this I can do with one file at a time.
I want to know how can I perform this in millions of document.
You don't need to convert documents to text, you can do this with images directly.
To do image classification you can build basic CNNs with Keras library.
https://towardsdatascience.com/building-a-convolutional-neural-network-cnn-in-keras-329fbbadc5f5
This basic CNN will be enough for you to train an image classifier. But you want to get state of the art accuracy, I recommend get a pretrained resnet50 and train it to build an image classifier. Other than accuracy, there is another major advantage of using pre trained network, you'll need less data to train a robust image classifier.
https://engmrk.com/kerasapplication-pre-trained-model/?utm_campaign=News&utm_medium=Community&utm_source=DataCamp.com
The only thing that you'll need to change is number of output classes from 1000 to the number of classes you want.

Machine Learning: What is the format of video inputs that I can pass for my machine learning algorithm to analyze input video?

There are certain machine learning algorithms in use that takes videos files as input. If I have to pull all the videos from youtube that are associated with a certain tag and provide them as input to this algorithm, what should be my input format?
There is no format in which you can pass a video to a machine learning algorithm, since it won't understand the contents of the video.
You need to preprocess the video first, which might depend on how you have to use it. In general you can do something like converting each frame of the video to CSV (same as preprocessing an image), which you can pass to your machine learning algorithm. If you want to process your frames sequentially, you may want to use a Recurrent Neural Network. Also if the video has some audio, then just find its audio time series, and combine each part of the time series with its corresponding video frame.

Is it possible to use Caffe Only for classification without any training?

Some users might see this as opinion-based-question but if you look closely, I am trying to explore use of Caffe as a purely testing platform as opposed to currently popular use as training platform.
Background:
I have installed all dependencies using Jetpack 2.0 on Nvidia TK1.
I have installed caffe and its dependencies successfully.
The MNIST example is working fine.
Task:
I have been given a convnet with all standard layers. (Not an opensource model)
The network weights and bias values etc are available after training. The training has not been done via caffe. (Pretrained Network)
The weights and bias are all in the form of MATLAB matrices. (Actually in a .txt file but I can easily write code to get them to be matrices)
I CANNOT do training of this network with caffe and must used the given weights and bias values ONLY for classification.
I have my own dataset in the form of 32x32 pixel images.
Issue:
In all tutorials, details are given on how to deploy and train a network, and then use the generated .proto and .caffemodel files to validate and classify. Is it possible to implement this network on caffe and directly use my weights/bias and training set to classify images? What are the available options here? I am a caffe-virgin so be kind. Thank you for the help!
The only issue here is:
How to initialize caffe net from text file weights?
I assume you have a 'deploy.prototxt' describing the net's architecture (layer types, connectivity, filter sizes etc.). The only issue remaining is how to set the internal weights of caffe.Net to pre-defined values saved as text files.
You can get access to caffe.Net internals, see net surgery tutorial on how this can be done in python.
Once you are able to set the weights according to your text file, you can net.save(...) the new weights into a binary caffemodel file to be used from now on. You do not have to train the net if you already have trained weights, and you can use it for generating predictions ("test").

HOG Feature Extraction of Arabic Line Images

I am doing a project on Writer Identification. I want to extract HOG features from Line Images of Arabic Handwriting. And than use Gaussian Mixture Model for Classification.
The link to the database containing the line Images is : http://khatt.ideas2serve.net/
So my questions are as follows;
There are three folders namely Test, Train and Validate. So, from which folder do I need to extract the features. And for what purpose should we use each of the folders.
Do we need to extract the features from individual images and merge them or is there any method to extract features of all the images together.
Test, Train and Validate
Read this stats SE question: What is the difference between test set and validation set?
This is basic machine learning, so you should probably go back and review your course literature, since it seems like you're missing some pretty important machine learning concepts.
Do we need to extract the features from individual images and merge them or is there any method to extract features of all the images together.
It seems, again, like you're missing basic concepts here. Histogram of oriented gradients subdivides the image and finds the oriented gradient. See this SO question for examples of hos this looks.
The traditional way of using HoG is: for each image in your training set, you extract the HoG, use these to train a SVM, validate the training with the validation set, then actually use the trained SVM on the test set.
You need to extract the HOG features from each image separately. Furthermore, you have to resize all images to be of the same size, otherwise all your HOG vectors will be of different length.
You can use the extractHOGFeatures function in MATLAB. See this example.

Resources