I am tasked with evaluating the performance of two separate SDK solutions.
I would like to ask what's the suggested way to evaluate which one is better in terms of precision?
thanks
Related
I have 66 features which i'm using to create a classifcation machine learning model in python. However, just to prevent issues like overfitting, I was wondering what the best way to reduce the number of fetures would be. I have read about PCA, but am not sure whether any good methodology exists to reduce features, and whether any tools exist in sklearn to help facilitate this.
Thanks.
The first thing you should then maybe do is reading through the documentation of scikit-learn's feature selection methods.
Every method has its perks and peeves, and which one is best (if there is even one) depends on the specific use-case.
That being said, the methods offered in scikit-learn are by no means exhaustive. But discussing different choices and elaborating on an appropriate method is maybe better asked on platforms like Cross Validated or similar.
Is there software out there that optimises the best combination of learning rate, weight ranges, hidden layer structure, for a certain task? After presumably trying and failing different combinations? What is this called? As far as I can tell, we just do it manually at the moment...
I know this is not differently code related but am sure it will help many others too. Cheers.
The above comes under multi variate optimization problem, use an optimization algorithm and check the results. Particle Swarm Optimization would do it ( there are however considerations to use this algorithm) as long as you have a cost function to optimize for example the error rate of the network output
I am trying to implement an SVM in Rapidminer. However I am presented with several SVM implementations, libsvm, mysvm,JMySVM, Particle Swarm Optimization based SVM and Evolutionary SVM. Know I know the basic differences between the implementations but what are the advantages and disadvantages of them to know which one to implement?
I am not finding much information about this online, I would like to avoid a try them all to see which one presents the best results. So I would like to know in which situation I should use them.
From the first, you seem to confuse different implementations and algorithms. As far as I know, libsvm, mysvm and JmySVM are standard implementation which solve the SVM optimization problem by algorithms such as sequential minimal optimization.
On the contrary, the other SVMs you mentioned (additionally) use less common approaches like particle swarm optimization or evolutionary algorithms for the optimization. Such methods usually give you good approximation with small effort, which might be advantageous for large-scale problems (--but I admit I don't know the exact motivation for their invention).
If you are looking for the SVM model which is common in machine learning and related fields, I would suggest you to try the library libsvm. Alternatively, you can have a look on the collection here.
I asked myself the question whether most people normally code the machine learning algorithms themselves or whether they are likely to use existing solutions like Weka or R packages.
Of course it depends on the problem - but let's say that I want to use a common solution like a neural network. Is there still a reason to code it myself? To understand the mechanism better and adapt it? Or is the thought of standardized solutions more important?
This is not a good question for Stackoverflow. It's an opinion question, not a programming problem.
Nevertheless, here is my take:
It depends on what you want to do.
If you want to find which algorithm works best for your data problem at hand, try ELKI, Weka, R, Matlab, SciPy, whatever. Try out all the algorithms you can find, and spend even more time on preprocessing your data.
If you know which algorithm you need and need to get it into production, many of these tools will not perform good enough or be easy enough to integrate. Instead, check if you can find low level libraries such as libSVM that provide the functionality you need. If these don't exist, roll your own optimized code.
If you want to do research in this domain, you are best off with extending the existing tools. ELKI and Weka have APIs that you can plug into to provide extensions. R doesn't really have an API (CRAN it's a mess...) but people just dump their code somewhere and (hopefully) add a manual how to use it. Extending these frameworks can save you a lot of effort: you have comparison methods ready to use, and you can re-use a lot of their code. ELKI for example has a lot of index structures to accelerate algorithms. Most of the time, the index acceleration is much harder to write than the actual algorithm. So if you can reuse the existing indexes, this will make your algorithms much faster, too (and you will also benefit from future enhancements to these frameworks).
If you want to learn about existing algorithms you better implement them yourself. You'll be surprised how much more there is to optimizing some algorithms than what is taught in class. E.g. APRIORI. The basic idea is quite simple. But getting all the pruning details right, I say 1 out of 20 students gets these details. If you implement APRIORI, then benchmark it against a known good implementation and try to understand why yours is much slower, then you'll actually discover the subtle details to the algorithms. And don't be surprised to see a factor of 100 performance difference between ELKI, R, Weka etc. - it's can still be the same algorithm, just implemented more or less efficiently when it comes to actual data structures used, memory layout etc.
I am working on testing several Machine Learning algorithm implementations, checking whether they can work as efficient as described in the papers and making sure they could offer a great power to our statistic NLP (Natural Language Processing) platform.
Could u guys show me some methods for testing an algorithm implementation?
1)What aspects?
2)How?
3)Do I have to follow some basic steps?
4)Do I have to consider diversity specific situations when using different programming languages?
5)Do I have to understand the algorithm? I mean, does it offer any help if I really know what the algorithm is and how it works?
Basically, we r using C or C++ to implement the algorithm and our working env is Linux/Unix. Our testing methods only focus on black box testing and testing input/output of functions. I am eager to improve them but I dont have any better idea now...
Great Thx!! LOL
For many machine learning and statistical classification tasks, the standard metric for measuring quality is Precision and Recall. Most published algorithms will make some kind of claim about these metrics, or you could implement them and run these tests yourself. This should provide a good indicative measure of the quality you can expect.
When you talk about efficiency of an algorithm, this is usually some statement about the time or space performance of an algorithm in terms of the size or complexity of its input (often expressed in Big O notation). Most published algorithms will report an upper bound on the time and space characteristics of the algorithm. You can use that as a comparative indicator, although you need to know a little bit about computational complexity in order to make sure you're not fooling yourself. You could also possibly derive this information from manual inspection of program code, but it's probably not necessary, because this information is almost always published along with the algorithm.
Finally, understanding the algorithm is always a good idea. It makes it easier to know what you need to do as a user of that algorithm to ensure you're getting the best possible results (and indeed to know whether the results you are getting are sensible or not), and it will allow you to apply quality measures such as those I suggested in the first paragraph of this answer.