Interesting metaheuristic algorithms, easy to lern, many usages in real-world applications - heuristics

I have never been interested in optimisation. Although almost all of my professors are in it. So I have been given few subjects, which are to be used in my thesis (it is a good word?). One of them. The result should be an application. So I'm looking for an interesting metaheuristic, evolutionary algorithm, ..., that is not too hard to understand and has various usages. Maybe someone has some experience?
The topics are:
Differential evolution algorithms
Coevolution in metaheuristics
algorithms
Multi ojective evolutionary
algorithms
...

From my experience, here are some metaheuristic algorithms, ordered from easy to hard to learn and their results (again, in my experience):
Hill climbing - bad results
Tabu Search - good results
Great Deluge - bad results
Genetics algorithms - medium results
Simulated Annealing - very good results (if you manage to implement it correctly)

Related

How Did Neural Networks Overcome the Bias/Variance Dilemma?

Deep learning has been seen as a rebranding of Neural Networks.
Were the issues presented in the paper "Neural Networks and the Bias/Variance Dilemma" by Stuart Geman ever resolved in the architectures in use today?
We learned a lot about NN, in particular:
we now learn better representations due to progress in unsupervised/autoregressive learning, such as restricted boltzman machines, autoencoders, denoising autoencoders, variational autoencoders, which help as stabilize the process, learn from reasonable representations
we have better priors - not neceserly in the strict probabilistic sense, but we know, that for example in image processing a good architecture is the convolutional one, thus we have a smaller (in terms of parameters), but better suited for the problem - models. Consequently we are less prone to overfitting.
we have better optimization techniques and activation functions - which help us with underfitting (we can learn larger networks), in particular - we can learn deeper networks. Why is deep often better then wide? Because again - this is another prior, the assumption that representation should be hierarchical, and it seems to be valid prior for many modern problems (even that not all of them).
dropout, and other techniques brought as better regularization methods (than previously known and used simple weights priors) - which again limits problem with overfitting (variance).
There are many more things that changed, but in general - we were simply able to find better architectures, better assumptions, thus we now search in more narrow class of hypotheses. Consequently - we overfit less (variance), and underfit less (bias) - yet there is still lots to be done!
Next thing is, as #david pointed out, amount of data. We have huge datasets now, we often have access to more data that we can process in a reasonable time, and obviously more data means less variance - even highly overfitting models start to behave well.
Last, but not least - hardware. This is something that every single deep learning expert will tell you - our computers got stronger. We still use the same algorithms, the same architectures (with many little tweaks, but the core is the same), but our hardware is exponentially faster, and this changes a lot.
#lejlot gave a good overview. I want to point to two specific parts of the whole process.
First, neural networks are universal approximators. That means, their bias in principle can be made arbitrarily small. The problem that was rather thought to be severe was overfitting -- too large variance.
Now, a common and successful way in Machine Learning to deal with too large variance is by "averaging it away" over many different predictions -- which should be as uncorrelated as possible. This worked in Random Forests, for instance, and in this way I tend to understand current Neural Networks as well (particular the maxout+dropout stuff). Of course, this is a narrow view -- there is further this whole representational learning stuff, the not explaining-away property, etc. -- but it's one I find suitable for your question regarding the bias/variance tradeoff.
Second point: there is no better way to prevent overfitting than having very much data. And currently we're in the situation to gather a lot of data.

Using machine learning to make a computer learn calculus

Are there any known approaches of making a machine learn calculus?
I've learnt that it is quite simple to teach calculating derivatives because it is possible to implement an algorithm.
Meanwhile, an implementation of integration is possible but is rarely or never fully implemented due to the algorithmic complexity.
I am curious whether there are any academic successes in the field of using machine learning science to evaluate and calculate integrals.
Edit
I am interested in teaching a computer to integrate using neural networks or similar methods.
My personal opinion it is not possible to feed into NN enough rules for integrating. Why? Because NN are good for linear regression ( AKA approximation ) or logical regression ( AKA classification ). Integration is neither of them. It is calculation task according to some strict algorithms. So from this prospective it's good idea to use some mathematical ways to integrate.
Update on 2020-10-23
Right now I'm in position of being ashamed by new developments according to news. Facebook recently announced that they developed some kind of AI, which is good in solving integrations.
There quite a few number of maths software that will compute derivatives and integral calculus for you. Some of the popular software include MATLAB, Maple, Mathematica, etc. These software will help you learn quite easily.
As for you making a machine learn calculus ...
You can read up on the following on wikipedia or other books,
Newton's Method - Solve the roots of a function numerically
Monte Carlo Integration - uses RNG to compute numeric integration
Runge Kutta Method - Solves ODE's iteratively
There are many more. These are just the ones I was taught in undergraduate school. They are also fairly simple to understand, depending on your level of academia. But in general, people have been try to numerically compute solutions to models since Newton. Computers have just made everything a lot easier.

How to approach a machine learning programming competition

Many machine learning competitions are held in Kaggle where a training set and a set of features and a test set is given whose output label is to be decided based by utilizing a training set.
It is pretty clear that here supervised learning algorithms like decision tree, SVM etc. are applicable. My question is, how should I start to approach such problems, I mean whether to start with decision tree or SVM or some other algorithm or is there is any other approach i.e. how will I decide?
So, I had never heard of Kaggle until reading your post--thank you so much, it looks awesome. Upon exploring their site, I found a portion that will guide you well. On the competitions page (click all competitions), you see Digit Recognizer and Facial Keypoints Detection, both of which are competitions, but are there for educational purposes, tutorials are provided (tutorial isn't available for the facial keypoints detection yet, as the competition is in its infancy. In addition to the general forums, competitions have forums also, which I imagine is very helpful.
If you're interesting in the mathematical foundations of machine learning, and are relatively new to it, may I suggest Bayesian Reasoning and Machine Learning. It's no cakewalk, but it's much friendlier than its counterparts, without a loss of rigor.
EDIT:
I found the tutorials page on Kaggle, which seems to be a summary of all of their tutorials. Additionally, scikit-learn, a python library, offers a ton of descriptions/explanations of machine learning algorithms.
This cheatsheet http://peekaboo-vision.blogspot.pt/2013/01/machine-learning-cheat-sheet-for-scikit.html is a good starting point. In my experience using several algorithms at the same time can often give better results, eg logistic regression and svm where the results of each one have a predefined weight. And test, test, test ;)
There is No Free Lunch in data mining. You won't know which methods work best until you try lots of them.
That being said, there is also a trade-off between understandability and accuracy in data mining. Decision Trees and KNN tend to be understandable, but less accurate than SVM or Random Forests. Kaggle looks for high accuracy over understandability.
It also depends on the number of attributes. Some learners can handle many attributes, like SVM, whereas others are slow with many attributes, like neural nets.
You can shrink the number of attributes by using PCA, which has helped in several Kaggle competitions.

What subjects, topics does a computer science graduate need to learn to apply available machine learning frameworks, esp. SVMs

I want to teach myself enough machine learning so that I can, to begin with, understand enough to put to use available open source ML frameworks that will allow me to do things like:
Go through the HTML source of pages
from a certain site and "understand"
which sections form the content,
which the advertisements and which
form the metadata ( neither the
content, nor the ads - for eg. -
TOC, author bio etc )
Go through the HTML source of pages
from disparate sites and "classify"
whether the site belongs to a
predefined category or not ( list of
categories will be supplied
beforhand )1.
... similar classification tasks on
text and pages.
As you can see, my immediate requirements are to do with classification on disparate data sources and large amounts of data.
As far as my limited understanding goes, taking the neural net approach will take a lot of training and maintainance than putting SVMs to use?
I understand that SVMs are well suited to ( binary ) classification tasks like mine, and open source framworks like libSVM are fairly mature?
In that case, what subjects and topics
does a computer science graduate need
to learn right now, so that the above
requirements can be solved, putting
these frameworks to use?
I would like to stay away from Java, is possible, and I have no language preferences otherwise. I am willing to learn and put in as much effort as I possibly can.
My intent is not to write code from scratch, but, to begin with putting the various frameworks available to use ( I do not know enough to decide which though ), and I should be able to fix things should they go wrong.
Recommendations from you on learning specific portions of statistics and probability theory is nothing unexpected from my side, so say that if required!
I will modify this question if needed, depending on all your suggestions and feedback.
"Understanding" in machine learn is the equivalent of having a model. The model can be for example a collection of support vectors, the layout and weights of a neural network, a decision tree, or more. Which of these methods work best really depends on the subject you're learning from and on the quality of your training data.
In your case, learning from a collection of HTML sites, you will like to preprocess the data first, this step is also called "feature extraction". That is, you extract information out of the page you're looking at. This is a difficult step, because it requires domain knowledge and you'll have to extract useful information, or otherwise your classifiers will not be able to make good distinctions. Feature extraction will give you a dataset (a matrix with features for each row) from which you'll be able to create your model.
Generally in machine learning it is advised to also keep a "test set" that you do not train your models with, but that you will use at the end to decide on what is the best method. It is of extreme importance that you keep the test set hidden until the very end of your modeling step! The test data basically gives you a hint on the "generalization error" that your model is making. Any model with enough complexity and learning time tends to learn exactly the information that you train it with. Machine learners say that the model "overfits" the training data. Such overfitted models seem to appear good, but this is just memorization.
While software support for preprocessing data is very sparse and highly domain dependent, as adam mentioned Weka is a good free tool for applying different methods once you have your dataset. I would recommend reading several books. Vladimir Vapnik wrote "The Nature of Statistical Learning Theory", he is the inventor of SVMs. You should get familiar with the process of modeling, so a book on machine learning is definitely very useful. I also hope that some of the terminology might be helpful to you in finding your way around.
Seems like a pretty complicated task to me; step 2, classification, is "easy" but step 1 seems like a structure learning task. You might want to simplify it to classification on parts of HTML trees, maybe preselected by some heuristic.
The most widely used general machine learning library (freely) available is probably WEKA. They have a book that introduces some ML concepts and covers how to use their software. Unfortunately for you, it is written entirely in Java.
I am not really a Python person, but it would surprise me if there aren't also a lot of tools available for it as well.
For text-based classification right now Naive Bayes, Decision Trees (J48 in particular I think), and SVM approaches are giving the best results. However they are each more suited for slightly different applications. Off the top of my head I'm not sure which would suit you the best. With a tool like WEKA you could try all three approaches with some example data without writing a line of code and see for yourself.
I tend to shy away from Neural Networks simply because they can get very very complicated quickly. Then again, I haven't tried a large project with them mostly because they have that reputation in academia.
Probability and statistics knowledge is only required if you are using probabilistic algorithms (like Naive Bayes). SVMs are generally not used in a probabilistic manner.
From the sound of it, you may want to invest in an actual pattern classification textbook or take a class on it in order to find exactly what you are looking for. For custom/non-standard data sets it can be tricky to get good results without having a survey of existing techniques.
It seems to me that you are now entering machine learning field, so I'd really like to suggest to have a look at this book: not only it provides a deep and vast overview on the most common machine learning approaches and algorithms (and their variations) but it also provides a very good set of exercises and scientific paper links. All of this is wrapped in an insightful language starred with a minimal and yet useful compendium about statistics and probability

How to test an Machine Learning or statistic NLP algorithm implementation pack?

I am working on testing several Machine Learning algorithm implementations, checking whether they can work as efficient as described in the papers and making sure they could offer a great power to our statistic NLP (Natural Language Processing) platform.
Could u guys show me some methods for testing an algorithm implementation?
1)What aspects?
2)How?
3)Do I have to follow some basic steps?
4)Do I have to consider diversity specific situations when using different programming languages?
5)Do I have to understand the algorithm? I mean, does it offer any help if I really know what the algorithm is and how it works?
Basically, we r using C or C++ to implement the algorithm and our working env is Linux/Unix. Our testing methods only focus on black box testing and testing input/output of functions. I am eager to improve them but I dont have any better idea now...
Great Thx!! LOL
For many machine learning and statistical classification tasks, the standard metric for measuring quality is Precision and Recall. Most published algorithms will make some kind of claim about these metrics, or you could implement them and run these tests yourself. This should provide a good indicative measure of the quality you can expect.
When you talk about efficiency of an algorithm, this is usually some statement about the time or space performance of an algorithm in terms of the size or complexity of its input (often expressed in Big O notation). Most published algorithms will report an upper bound on the time and space characteristics of the algorithm. You can use that as a comparative indicator, although you need to know a little bit about computational complexity in order to make sure you're not fooling yourself. You could also possibly derive this information from manual inspection of program code, but it's probably not necessary, because this information is almost always published along with the algorithm.
Finally, understanding the algorithm is always a good idea. It makes it easier to know what you need to do as a user of that algorithm to ensure you're getting the best possible results (and indeed to know whether the results you are getting are sensible or not), and it will allow you to apply quality measures such as those I suggested in the first paragraph of this answer.

Resources