Where can I get algorithms pseudocode of algorithm C5.0 or C4.5? - machine-learning

I want to go through algorithm of C5.0 or C4.5
I want to know how does it works so that I can better predict where it would fetch better result.

I think nowhere. It is a proprietary stuff. Use CART - it is much better.

Related

How do we know if we have to go MinMax normalization or Mean normalization?

I am trying to figure out how do I know when I should go for MinMax normalization and when I should go for Mean normalization?
Both of them do pretty much the same thing i.e. to restrict the dataset within a range. But how do I know which normalization is better for my dataset?
Please include examples as it will help in better understanding.

MachineLearning or DecisionTree for job matching?

I'm working on a job matching app and I was wondering what's the best way to match elements between themselves to sort the best result?
In my mind it's by going through a decision tree as we already know the structure of the element and the expected result.
However, would the machinelearning be an alternate solution or is it worthless to do so?
I might be mistaken but to me ML is efficient for sorting datas which at first sight don't have obvious common points, right?
Thanks for your advices!
Decision tree is part of ML. Maybe you means more complex algorithm than decision tree, like xgboost or neural network.
xgboost or neural network are good when you have too much variables to make sense to create manually a decision tree.
Decision tree is better when you want to control your algorithm (for exemple, for ethical or managerial reasons).
xgboost and unsupervised thec are also good to create the boundaries used in your decision tree. For exemple, should you create a category 18-25 or 18-30, etc..
Considering the complexity of such a problem, with time and geographical constrain, using advanced algorithms seems a good idea to me.
Have a look at that Kaggle competition which seems close to your problem, it may give you some good insight:
https://www.kaggle.com/c/job-recommendation/data

how to apply genetic algorithm on 2d or multidimesional images for optimisation

I am trying to Code a genetic algorithm in Matlab but really dont know how it works in images and how to proceed? Is there any basic tutorial that can help me understand how to apply GA on images (starting from 2d to multidimentional images ).
That will be a great help for me.
Thanking everyone in anticipations.
Kind Regards.
For GA you need two things: a fitness function that can evaluate any solution and tell how good it is, and a representation of your solution so that you can do crossover and mutation. Once you have these, you are good to go. I'm not an expert on image processing so I can't help you with that exactly.
Look at the book Essentials of metaheuristics which is a very good resource for start with evolutionary computation (and not only that) in general. It's free.
There is a paper on this subject which you can find at the IEEE library. I believe it solves the problem you vaguely describe.

How to choose the right normalization method for the right dataset?

There are several normalization methods to choose from. L1/L2 norm, z-score, min-max. Can anyone give some insights as to how to choose the proper normalization method for a dataset?
I didn't pay too much attention to normalization before, but I just got a small project where it's performance has been heavily affected not by parameters or choices of the ML algorithm but by the way I normalized the data. Kind of surprise to me. But this may be a common problem in practice. So, could anyone provide some good advice? Thanks a lot!

NLP and Ruby to characterize quality of writing

I'd like to take a shot at characterizing incoming documents in my app as either "well" or "poorly" written. I realize this is no easy task, but even a rough idea would be useful. I feel like the way to do this would be via naïve Bayes classifier with two classes, but am open to suggestions. So two questions:
is this method the optimal (taking into account simplicity) way to do this
assuming a large enough training db?
are there libraries in ruby
(or any integratable JRuby or
whatever) that i can plug into my
rails app to make this happen with little fuss?
Thanks!
You might try using vocabulary vector analysis. Covered some here:
http://en.wikipedia.org/wiki/Semantic_similarity
Basically you build up a corpus of texts that you deem "well-written" or "poorly-written" and count the frequency of certain words. Make a normalized vector for each, and then compute the distance between those to the vectors of each incoming document. I am not a statistician, but I'm told it's similar to Bayesian filtering, but seems to deal with misspellings and outliers better.
This is not perfect, by any means. Depending on how accurate you need it to be, you will probably still need humans to make the final judgement. But we've had good luck using it as a pre-filter to reduce number of reviewers.
Another simple algorithm to check out is the Flesch-Kincaid readability metric. It is quite widely used and should be easy to implement. I assume one of the Ruby NLP libraries has syllable methods.
You may find interesting this Burstein, Chodorow, and Leacock on the Criterion essay evaluation system for a pretty interesting very high-level overview of how one particular system did essay evaluation as well as style correction.

Resources