What is the best "turnkey" stemming algorithm? [closed] - comparison

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I need a good stemming algorithm for a project I'm working on. It was suggested that I look at the Porter Stemmer. When I checked out the page on the Porter stemmer I found that it is deprecated now in favor of the "Snowball" stemmer.
I need a good stemmer, but I can't really spend significant time implementing (or optimizing) my own. What is the best "off the shelf", freely available stemmer? Are there any non-free stemmers available for a reasonable price? Or, is the Snowball stemmer my best bet?

The Porter2 stemmer is the one I've decided to go with. It seemed the porter stemmer was the standard, but when I found the page by the author he recommended the "Snowball (Porter2)" stemmer. There is a C port link on this page.

It really depends on how you're planning to apply it. The Natural Language Toolkit (http://nltk.sourceforge.net) has a number of stemmers implemented in it that should be able to handle most applications. I prefer the Morphy stemmer.
Of course, it's available in Python, so if you're working with another language, you can always look through the code to glean the algorithm and transfer it to your language of choice. Python is highly readable.

Related

Which parsec-like libraries for OCaml are recommended for actual use? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
The Google tells me there are several parsec-like libraries for OCaml: Batteries' ParserCo, Planck, Mparser, PCL, and ocaml-parsec. My problem is knowing which one to choose. Can someone give me some feedback concerning stability, active maintenance, quality of documentation, etc?
I have a vague idea of how ParserCo, Planck and PCL look like, and I would start from Planck, expecting to find some rough edges and evolve the library a bit myself over use. None of them are really actively documented, but Planck got some "serious" test cases (parsing the OCaml grammar itself) and the developer, Jun Furuse, is reactive may be interested in getting it upto shape.
That said, parsing combinator libraries are not that popular in the OCaml world. We still quite actively use parser generators. If you don't have strong opinions either way, I recommend that you have a try at Menhir, that is quite polished and nice to use (and also actively maintained).

when is it most appropriate to use a micro framework? (instead of something like rails, django or catalyst) [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I have spent some time familiarizing myself with
rails (ruby),
django...and other things like zope and pylons/pyramid (python),
catalyst (perl)
but often find myself wanting to use
sinatra (ruby)
bottle...or flask...(python)
dancer...(perl)
I'm not entirely sure...when I'm about to start a new project, which I should use.
What should be the deciding factor that makes me switch from a micro framework to something more substantial. Is it just when I would otherwise have too much SQL to write? I think not, because if that were the case I could just use an ORM library/module.
My main issue is a fear of choosing something that other developers would not understand if someone else needed to fix the site at a later point in time. Still I am still not sure what should inform my opinion.
With miсro frameworks you have more freedom in the use of libraries, you can add what you think is right. In large frameworks such as Django and etc already much that is "screwed" and there are certain rules and best practices how best to write certain things.

Book with vslam [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am looking for a book where some monocular/visual SLAM is described and implemented.
Can you list and recommend some?
I'd like to use OpenCV but that is not a requirement.
I don't know of a book with a description of such an algorithm, but there's a complete open source implementation (in C++) of a vslam system available as part of the Robot Operating System. It uses SURF descriptors and vocabulary trees for place recognition, and bundle adjustment for SLAM. It does use OpenCV heavily as it's made by the same people. See the website here. I can't say for sure as they don't mention and I haven't looked in great detail, but their implementation seems to be based on, or at least is similar to, this paper.
Edit: The paper linked above was actually written by the people who implemented the vslam system given above, it appears. So it is definitely a good resource for understanding it.
I don't know about a book, but maybe PTAM can be useful. The ISMAR 2007 paper by Klein has a description about the system and the source code is available so you can check the details.
Of course, PTAM is just a (good) method in the SLAM field.

What is the most accurate open-source tool for sentence splitting? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I need to split text into sentences. I'm currently playing around with OpenNLP's sentence detector tool. I've also heard of NLTK and Stanford CoreNLP tools. What is the most accurate English sentence detection tools out there? I don't need too many NLP features--only a good tool for sentence splitting/detection.
I've also heard about Lucene...but that may be too much. But if it has a kick-ass sentence detection module, then I'll use it.
NLTK includes an implementation of the Punkt tokenizer described in this paper. I don't know if it's the absolute best around but it's very very good, it's lightweight and easy to use, and it's free.
check lingpipe implementation http://alias-i.com/lingpipe/docs/api/com/aliasi/sentences/IndoEuropeanSentenceModel.html
Their model quite powerful, and easy to implement - check few pre/post rules(aka regexps) at any possible sentence split and thats all. I found it working better then one in GATE and OpenNLP.
There are another open source project which support this heuristic model as example, http://code.google.com/p/graph-expression/wiki/SentenceSplitting
Perl is a text processing language that is an excellent and simple resource for text mining. It has absolutely no problem doing sentence splitting.
www.perl.org

collaborative filtering in rails [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm looking for a solution for collaborative filtering in rails or even possible examples. So far I have only found acts_as_recommendable which looks useful but I noticed it hasn't had any updates in the last 2 years.
Does anyone know of any other solutions and/or examples?
Have you evaluated Apache Mahout? It is a Java based solution, with HTTP access to recommendation engine.
Reference:
Introducing Mahout
This pertains to the examples part of your question, as both the libraries mentioned below are in Java.
The article referenced in above answer, written by Apache Mahout-Taste library author, has neat examples, source code of the examples( using 2.5GB wikipedia data) , and an excellent packaging to run and see those examples in action, in a few minutes.
Apache Mahout-Taste
The specific section is Building a recommendation engine
Here is another open source recommendation engine.
easyrec
In 2013, there's the ActiveRecord Reputation System gem by Twitter. There's also a free RailsCast on the topic.
here is a 50 line recommendation system in Ruby: http://www.igvita.com/2007/01/15/svd-recommendation-system-in-ruby/
there is one link there which doesn't work (when he says "mathies click here" it points to Using linear algebra for intelligent information retrieval by Berry et al.: http://www2.denizyuret.com/ref/berry/berry95using.pdf )

Resources