What is MFCC simply? [closed] - machine-learning

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 5 days ago.
Improve this question
I am new to music genre recognition and I am trying to do a project which classifies which genre a certain music clip is(I am using GTZAN).
I've came across an open source code in kaggle and I have used his preprocessing
(link)
I saw them using MFCC, but I need to understand why they are using these values(For example why 13 coefs?)
Moreover, I need to understand what is MFCC, I have basic knowledge of physics but it is very difficult to understand what it represents, and why he chose these values(I don't need to understand broad physics behind it, just as simple as possible please).
Another question,
MFCC image example
for example, the X axis here represents the time, but what the squares, or the Coefs represents in Y axis?
I have tried to search in the internet, but there are a lot of physics and music theory behind it, I need a simple explanation.
Thanks.

Related

Why are linear layers used in Binary Classification with Deep Learning? [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 4 months ago.
Improve this question
In many examples of Binary Classification with Deep learning
Why are linear layers used? I've been trying to look around the internet for information on the reason for the use of linear layers
e.g.
https://github.com/StatsGary/PyTorch_Tutorials/blob/main/01_MLP_Thyroid_Classifier/PyTorch_Binary_From_Scratch.py
https://hutsons-hacks.info/building-a-pytorch-binary-classification-multi-layer-perceptron-from-the-ground-up
Linear layer is just another (a bit mathematically incorrect) name of a fully connected layer, the most standard, classic, and in some sense - powerful building block of neural networks. Networks built purely from fully connected layers are universal approximators, and thus a good starting point for any sort of investigation.

Can we predict y when y value is not numeric? [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 7 months ago.
Improve this question
I am using support vector regressor. I want to predict personality as shown in screenshot! Is it possible to predict when y is in string format? I used onehot encoder but its not working.
This is not a regression task, but classification. 'not working' is not very informative, however normally you'd just map classes to integers. Either sklearn.preprocessing.LabelEncoder, sklearn.preprocessing.label_binarize().argmax(axis=1), pandas.factorize() or manual mapping should get the job done.
Worth noting support vector machines don't handle multiclass problems natively, so you may encounter troubles depending on the exact model you use. At least the latest sklearn versions should handle it automatically when using models like sklearn.svm.LinearSVC, building N binary classifiers under the hood.
I'd also recommend getting acquainted with a more elegant way of ensembling SVMs for multiclass problems, using sklearn.multiclass.OutputCodeClassifier().

Robotics Project based on slam algorithm [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am very beginner in robotics. I want to make a robotics project based on slam algorithms. I know many algorithm and i have the confidence to implement it in any language but i dont have any idea based on image processing and hardware. So, can anyone give a tuotorial based on slam based robotics projects[including how hardware organized and how image processing is done for that project], after seeing that i can make a slam based robotics project from my own.
In addition, If anyone give me a video lecture series for that then it would be very helpful.
Thanks in advance.
I have tried to do something similar last year. I created two systems. The first system made use of a camera and laser to detect objects and determine their location relative to the system itself. The second system was a little robot with tracks (wheels would be better), that used dead reckoning to keep track of its own location relative to its starting location. The techniques worked really well, but unfortunately I did not have the time to combine the two systems. I can however provide you with some documentation that was incredibly useful for me at that time.
These tutorials provide information on both the hardware and the software.
Optical Triangulation (detection of objects with a camera and laser) :
http://www.seattlerobotics.org/encoder/200110/vision.htm
Dead Reckoning (a technique to keep track of one's own location) :
http://www.seattlerobotics.org/encoder/200010/dead_reckoning_article.html

Are there any existing research in Voice Recognition that can distinguish voice from different people? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I just came up with an idea that I want to develop into an application to distinguish/auto detect voices from different people.
Sample use case: After training with Obama and Romney's data, the application would be able to detect whenever either one speak again (not necessary the same content from the training data)
I am wondering if there are any existing research on this. (I don't know how to search for this. I tried a couple keywords and got no significant results.)
If not, what is a good way to start? How to choose features, data representation, models, etc.
Thanks!
I found Speaker recognition on Wikipedia which in turn linked to An overview of text-independent speaker recognition: From features to supervectors (Kinnunen, Li, 2010).
From the abstract of the paper:
This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods.

Is there a good library for cutting out people in a still photo? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
Are there any libraries, in any language, out there to help identify and grab the images of people in a still photo? Something similar in effect to the way the Kinect can isolate users.
Thanks much!
I think it depends very much on the setup (e.g. simple bg. with decent lighting condition vs. random bg. with random lighting). If you can make life easier for yourself and isolate a few simpler use cases that would be great. Still there are other available method, look at the plethora of research around pedestrian detection for example.
One thing I did try and it works surprisingly well although computationally intensive is the Histogram of Gradient Orientations, implemented in OpenCV as the HoG descriptor. For a still photo this should produce decent results. You can have a look at the OpenCV sample. I also recommend having a look at Dramanan's excellent papers.
Long story short, thanks for years of inspiring research in computer vision, there are quite a few interesting options out there, it's up to how willing you are to go into detail. Still, regardless of how clever algorithms can be, I believe it's far more important to get a decent setup that allows simple and efficient solutions rather than complex solutions that try to cater for every possible situation. Goodluck!

Resources