Useful Entry-Level Resources for Machine Learning [closed]

Useful Entry-Level Resources for Machine Learning [closed] - machine-learning

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I am looking for some entry level posts about machine learning. Can anyone suggest anything for somebody new to this subject?

By 'posts' i'll assume you mean any resource available online.
I recommend two groups of resources:
First, find Machine Learning blogs in which the blogger's preferred language is the same as yours. In my experience, reading a blog post on a single subject (e.g., SVM) while reading through the author's source code supplied along with blog post is about the best way for a programmer to learn ML. A couple of excellent examples are the blogs Smell the Data (Python), and Igvita (Ruby). Both contain (at least) several posts each describing, tutorial-style, specific ML techniques, which include close walk-throughs of their (posted) source code. Igvita, in particular, has excellent tutorials with working Ruby code on Support Vector Machines, Decision Trees, Singular Value Decomp, and Ensemble Methods--like, the other blog i mentioned, an upper-level undergraduate course could be taught based solely on the ML posts in either blog.
Second, I highly recommend VideoLectures.net.
This is by far the best source--whether free or paid--i have found for very-high quality (both w/r/t the video quality and w/r/t the presentation content) video lectures and tutorials on machine learning. The target audience for these video lectures ranges from beginner (some lectures are specifically tagged as "tutorials") to expert; most of them seem to be somewhere in the middle.
All of the lectures and tutorials are taught to highly experienced professionals and academics, and in many instances, the lecturer is the leading authority on the topic he/she is lecturing on. The site is also 100% free.
The one disadvantage is that you cannot download the lectures and store them in e.g., itunes; however, nearly every lectures has a set of slides which you can download (or, conveniently, you can view them online as you watch the presentation).
A few that i've watched and that i can recommend highly:
Semi-Supervised Learning Approaches
Introduction to Machine Learning
Gaussian Process Basics
Graphical Models
k-Nearest Neighbor Models
Introduction to Kernel Methods

Machine learning is such a broad topic. I would start with Wikipedia and focus in on areas that you find interesting.
Also, you could visit the Stack Exchange site for machine learning.

Stanford published videos and materials from a set of engineering courses at http://see.stanford.edu
One course by Andrew Ng focuses on Machine Learning techniques
http://see.stanford.edu/see/courseinfo.aspx?coll=348ca38a-3a6d-4052-937d-cb017338d7b1
The course is also available on iTunes U
Its a really good course from someone who obviously knows the field well, but he spends alot of the time deriving mathematical results - so if your rusty in linear algebra or prob/stats, you might need a refresher first.

I think the best that I know of are:
Stanford's Lectures on Machine Learning
Books: (In decreasing order of ease of understanding - IMHO)
Machine Learning: An Algorithmic Perspective by Stephen Marsland
Pattern Recognition and Machine Learning by Christopher Bishop
Introduction to Machine Learning - Ethem Alpaydin

Related

Tips for writing an algorithm for paraphrasing sentences(machine learning)

I am doing a project at the university and I need to train an algorithm to rephrase sentences, what can you advise for implementation? Is it possible to use a translator to translate into another language in the end to get a paraphrased sentence? Also i want to use Word2Vec, or it's a bad idea?

This kind of broad-advice question – and about a very-tough problem, paraphrasing text, that is still a very active research problem – would be better answered by surveyin the research literature.
A great site for searching relevant papers – and then finding other related papers once you've set some positive examples – is http://www.arxiv-sanity.com/.
Searching for [paraphrasing] or [summarization] would give you a running start in seeing major techniques & their limitations. And, once you start bookmarking papers by the little 'disk' icon, it can autosuggest important related papers... so even if your 1st few finds are tangential or far-from-usefulness, it can lead you to the seminal papers, & prevailing cutting-edge algorithms/libraries, pretty quickly.

Which datamining tool to use? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
Can somebody explain me the main pros and cons of the most known datamining open-source tools?
Everywhere I read that RapidMiner, Weka, Orange, KNIME are the best ones.
look at this blog post
Can somebody do a fast technical comparison in a small bullet list.
My needs are the following:
It should support classification algorithms (Naive Bayes, SVM, C4.5,
kNN).
It should be easy to implement in Java.
It should have understandable documentation.
It should have reference production projects or use cases working on in.
some additional benchmark comparison if possible.
Thanks!

I would like to say firstly there are pro's and cons for each of them on your list however I would suggest out of your list weka from my personal experience it is incredibly simple to implement in your own java application using the weka jar file and has its own self contained tools for data mining.
Rapid miner seems to be a commercial solution offering an end to end solution however the most notable number of examples of external implementations of solutions for rapid miner are usually in python and r script not java.
Orange offers tools that seem to be targeted primarily at people with possibly less need for custom implementations into their own software but a far easier time with user itneraction, its written in python and source is available, user addons are supported.
Knime is another commercial platform offering end to end solutions for data mining and analysis providing all the tools required, this one has various good reviews around the internet but i havent used it enough to advise you or anyone on the pro's or cons of it.
See here for knime vs weka
Best data mining tools
As i said weka is my personal favorite as a software developer but im sure other people have varying reasons and opinions on why to choose one over the other. Hope you find the right solution for you.
Also per your requirements weka supports the following:
Naivebayes
SVM
C4.5
KNN

I have tried Orange and Weka with a 15K records database and found problems with the memory management in Weka, it needed more than 16Gb of RAM while Orange could've managed the database without using that much. Once Weka reaches the maximum amount of memory, it crashes, even if you set more memory in the ini file telling Java virtual machine to use more.

I recently evaluated many open source projects, comparing and contrasted them with regards to the decision tree machine learning algorithm. Weka and KNIME were included in that evaluation. I covered the differences in algorithm, UX, accuracy, and model inspection. You might chose one or the other depending on what features you value most.

I have had positive experience with RapidMiner:
a large set of machine learning algorithms
machine learning tools - feature selection, parameter grid search, data partitioning, cross validation, metrics
a large set of data manipulation algorithms - input, transformation, output
applicable to many domains - finance, web crawling and scraping, nlp, images (very basic)
extensible - one can send and receive data other technologies: R, python, groovy, shell
portable - can be run as a java process
developer friendly (to some extent, could use some improvements) - logging, debugging, breakpoints, macros
I would have liked to see something like RapidMiner in terms of user experience, but with the underlying engine based on python technologies: pandas, scikit-learn, spacy etc. Preferably, something that would allow moving back and forth from GUI to code.

The intersection of Machine Learning and Programming Languages fields

While my research area is in Machine Learning (ML), I am required to take a project in Programming Languages (PL). Therefore, I'm looking to find a project that is inclined towards ML.
One intersection I know of between the two fields is Natural Language Processing (NLP), but I couldn't find concrete papers in that topic that are related to PL; perhaps due to my poor choice of keywords in the search query.
The main topics in the PL course are : Syntax & Symantics, Static Program Analysis, Functional Programming, and Concurrency and Logic programming
If you could suggest papers or keywords that are Machine Learning enthusiast friendly, that would be highly appreciated!

Another very important intersection in these fields is probabilistic programming languages, which provide probabilistic inference over models specified as actual computer programs. It's a growing research field, including a recently started DARPA program on this topic.

If you are interested in NLP, then I would focus on two aspects of listed PL disciplines:
Syntax & Semantics - as this is incredibly closely realted to the NLP field, where in most cases the understanding is based on the various language grammars. Searching for papers regarding language modeling, information extraction, deep parsing would yield dozens of great research topics which are heavil related to the sytax/semantics problems.
logic programming -"in good old years" people believed that this is a future of AI, even though it is not (currently) true, it is still quite widely used forreasoning in some fields. In particular, prolog is a good example of language that can be used to reson (for example spatial-temporal reasoning) or even parse language (due to its "grammar like" productions).
If you wish to tackle some more ML related problem rather then NLP then you could focus on concurrency (parallelism) as it is very hot topic - making ML models more scalable, more efficient, "bigger, faster, stronger" ;) Just lookup keywords like GPU Machine Learning, large scale machine learning, scalable machine learning etc.

I also happen to know that there's a project at the University of Edinburgh on using machine learning to analyse source code. Here's the first publication that came out of it

Machine learning/information retrieval project

I’m reading towards M.Sc. in Computer Science and just completed first year of the source. (This is a two year course). Soon I have to submit a proposal for the M.Sc. Project. I have selected following topic.
“Suitability of machine learning for document ranking in information retrieval system”. Researchers have been using various machine learning algorithms for ranking documents. So as the first phase of the project I will be doing a complete literature survey and finding out advantages/disadvantages of current approaches. In the second phase of the project I will be proposing a new (modified) algorithm in order to overcome the limitations of current approaches.
Actually my question is whether this type of project is suitable as a M.Sc. project? Moreover if somebody has some interesting idea in information retrieval filed, is it possible to share those ideas with me.
Thanks

Ranking is always the hardest part of any of Information Retrieval systems. I think it is a very good topic but you have to take care to -- as soon as possible -- to define a scope of the work. Probably you will not be able to develop a new IR engine but rather build a prototype based on, e.g., apache lucene.
Currently there is a lot of dataset including stackoverflow data dump, which provide you all information you need to define a rich feature vector (number of points, time, you can mine topics of previous question etc., popularity of a tag) for you machine learning ranking algorithm. In this part of the work you could, e.g., classify types of features (e.g., user specific, semantic feature - software name in the title) and perform series of experiments to learn which features are most important and which are not for a given dataset.
The second direction of such a project can be how to perform learning efficiently. The reason behind is the quantity of data within web or community forums and changes in the forum (this would be important if you take a community specific features), e.g., changes in technologies, new software release, etc.
There are many other topics related to search and machine learning. The best idea is to search on scholar.google.com for the recent survey papers on ranking, machine learning, and search to learn what is the state-of-the-art. The very next step would be to talk with your MSc supervisor.
Good luck!

Everything you said is good and should be done, but you forgot the most important part:
Prove that your algorithm is better and/or faster than other algorithms, with good experiments and maybe some statistics (p-value, confidence interval).
If you do that and convince people that your algorithm is useful you surely will not fail :)

DSP Algorithms Book [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I'm looking for a book similar to "Introduction to Algorithms" by Thomas Cormen geared towards DSP algorithms. Is there anything as thorough as Cormen on the DSP market?
EDIT
I should say I'm looking for a book that's analogous to The Joy of Cooking.

Disclaimer - I am not familiar with the Cormen book so I'm not quite sure what you're looking for.
I'm a huge fan of "A Digital Signal Processing Primer" by Ken Steiglitz. It introduces DSP concepts like sampling, as well as simple filtering implementations, without relying just on math for explanation. Cookbook equivalent: You know how to boil water on a stove, but you're nervous about the rest.
A more advanced book, more of a practitioner's handbook than a text, is "Theory and Application of Digital Signal Processing" by Lawrence Rabiner and Bernard Gold. Their explanation of the overlap-save FFT technique for convolution, in particular, is the best I've ever come across. Cookbook equivalent: Maybe Joy of Cooking, maybe the Cordon Bleu tome.
And "Telecommunications Breakdown" by Richard Johnson and William Sethares is great for taking some DSP concepts and bringing them to life by implementing a radio in software. Cookbook equivalent: A tour through a specific cuisine, and explains what "braising" is along the way.
Hope these are of use to you!

For theory, I like Understanding DSP by Rick Lyons, which also has some nice "recipe-type nuggets".
More practical, and much more "nuggetty" is Streamlining DSP, same author. There's some really interesting stuff in there (IMHO!). Some of it is of the "lost knowledge" variety - especially in these days of just running Matlab's filter design functions. Some of it relates to limited hardware machines (which is great for tiny microcontroller or FPGA implementations).
The articles are written by serious, practicing DSP engineers (many of whom hang out on news:comp.dsp) in a very accessible style.
(I'm afraid I'm no good with cooking analogies though :)

Just for the record and benefit to others, I would recommend The Scientist and Engineer's Guide to Digital Signal Processing.
This is a good book for beginners.

There are a few online books available as well at the great DSPRelated.com:
INTRODUCTION TO DIGITAL FILTERS WITH AUDIO APPLICATIONS by JULIUS O. SMITH
MATHEMATICS OF THE DISCRETE FOURIER TRANSFORM (DFT) WITH AUDIO APPLICATIONS by JULIUS O. SMITH

This is not a book but I'm sure it'll be a valuable resource: The Ecole Polythechnique de Lausanne is starting a free online course on digital signal processing on February 18th 2013: https://www.coursera.org/course/dsp.
Also, the guys teaching it co-authored a book on the topic: http://www.sp4comm.org/

A second vote for the Rick Lyons book. You might also want to get a couple of DSP "bibles", e.g. Oppenheim & Schafer and Proakis & Manolakis, which are more theoretical but cover more ground.

The DSP handbook: algorithms, applications and design techniques - Bateman, Andrew, Paterson-Stephens, Iain 2002
and
Introduction to digital signal processing - Meddins, Bob 2000
Have basically made my ADSP module a breeze (so far). They are also at the top of the suggested reading list. As such, both are fairly beginner friendly, and the latter includes Matlab examples.
The former is probably more Delia, while the latter is more 'my first cook book'.

I will also add Vetterli, Kovacevic, and Goya's Foundations of Signal Processing, which can be downloaded for free.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart