Any one with sentient analysis experience with liblinear algorithm. Any one have used liblinear-ruby-swig gem?
Please suggest me something to start with.
I used lib linear a lot for other classification not for sentiment analysis
Are you interested in using lib linear or to do sentiment analysis?
For simple sentiment analysis look at
https://chrismaclellan.com/blog/sentiment-analysis-of-tweets-using-ruby
sad_panda gem (https://rubygems.org/gems/sad_panda) is similar to an R library I have used in the past. It has tools for both polarity and emotion classification of text (as "sadness", "anger", "joy", a few others).
There is not much work in ruby for sentiment analysis or machine learning in general. One of the best machine learning library is weka, so you can consider using it with jruby.
That said I have created an entry level gem, I am planning to enhance it by porting some of the weka algorithms in ruby.
Related
I have started using Julia.I read that it is faster than C.
So far I have seen some libraries like KNET and Flux, but both are for Deep Learning.
also there is a command "Pycall" tu use Python inside Julia.
But I am interested in Machine Learning too. So I would like to use SVM, Random Forest, KNN, XGBoost, etc but in Julia.
Is there a native library written in Julia for Machine Learning?
Thank you
A lot of algorithms are just plain available using dedicated packages. Like BayesNets.jl
For "classical machine learning" MLJ.jl which is a pure Julia Machine Learning framework, it's written by the Alan Turing Institute with very active development.
For Neural Networks Flux.jl is the way to go in Julia. Also very active, GPU-ready and allow all the exotics combinations that exist in the Julia ecosystem like DiffEqFlux.jl a package that combines Flux.jl and DifferentialEquations.jl.
Just wait for Zygote.jl a source-to-source automatic differentiation package that will be some sort of backend for Flux.jl
Of course, if you're more confident with Python ML tools you still have TensorFlow.jl and ScikitLearn.jl, but OP asked for pure Julia packages and those are just Julia wrappers of Python packages.
Have a look at this kNN implementation and this for XGboost.
There are SVM implementations, but outdated an unmaintained (search for SVM .jl). But, really, think about other algorithms for much better prediction qualities and model construction performance. Have a look at the OLS (orthogonal least squares) and OFR (orthogonal forward regression) algorithm family. You will easily find detailed algorithm descriptions, easy to code in any suitable language. However, there is currently no Julia implementation I am aware of. I found only Matlab implementations and made my own java implementation, some years ago. I have plans to port it to julia, but that has currently no priority and may last some years. Meanwhile - why not coding by yourself? You won't find any other language making it easier to code a prototype and turn it into a highly efficient production algorithm running heavy load on a CUDA enabled GPGPU.
I recommend this quite new publication, to start with: Nonlinear identification using orthogonal forward regression with nested optimal regularization
I would like to do sentiment analysis on document level. But I am try to do sentiment analysis Nepali. So, I dont have any resources. I can't do Naive Bayes Classifier as I don't have any labelled data and I can't do vai wordnet as no nepali wordnet exist. Papers I read generally had labelled data or senti-wordnet for other languages.
I would like know these things:
Which approach should I use in above case for sentiment analysis?
Is there any method for me to dynamically generate labels for data?
Since you don't have any labelled data, Have a look at this GitHub Repo, feel free to fork.
It has the code for neural network for Handwriting recognition in Java. Jeff Heaton has done it easy for us, with a nice UI, you can train this model to recognize Nepali.
And for sentiment Analysis, you can try using Opennlp which has some good support, this blog for Beginner's.
Also DL4J is a good library for deep learing for Java which can be used for Sentiment Analysis. It has a good Word2Vector Implementation and has a lot of support.
These resources will help you, any futher doubts-feel free to comment.
I've recently started learning machine learning algorithms. I've written a program in python from scratch to implement linear regression but I need some data pairs to use.
There are many dataset at internet to use,
have a look here, you can find many real datasets: uci
You can use scikit learn, It has some good in build dataset. You can refer this document.
Hi I'm new at machine learning and therefore looking for a text classification solution. Could one recommend me a nice framework written in java? I thought about using WEKA, but also heard about MALLET. What's better, where are the main differences?
My target is to classify unlabeled text. Therefore I prepared about 18 topics and 100 text for each topic for learning.
What would you recommend to do? Would also appreciate a nice little example or hint of how to proceed.
You have a very minimal text data set, you could use any library - it wouldn't really matter. More advanced options would require more data than you have to be meaningful, so its not an issue worth considering. The simple way text classifications problems are handled is to use a Bag of Words model and a linear classifier. Both Weka and MALLET support this.
Personally, I find Weka to be a pain and MALLET to be poorly documented / out of date when it is, so I use JSAT. There is an example on doing spam classification here.
(bias warning, I'm the author of JSAT).
Since your task is fairly simple and as you mentioned you're new at ML, I'd recommend you to use weka as it is easy to use and has a large user community.
Otherwise here are some General Purpose Machine Learning frameworks in Java that you can have a look at:
Datumbox - Machine Learning framework for rapid development of Machine Learning and Statistical applications
ELKI - Java toolkit for data mining. (unsupervised: clustering, outlier detection etc.)
H2O - ML engine that supports distributed learning on data stored in HDFS.
htm.java - General Machine Learning library using Numenta’s Cortical Learning Algorithm
java-deeplearning - Distributed Deep Learning Platform for Java, Clojure,Scala
JAVA-ML - A general ML library with a common interface for all algorithms in Java
JSAT - Numerous Machine Learning algoirhtms for classification, regresion, and clustering.
Mahout - Distributed machine learning
Meka - An open source implementation of methods for multi-label classification and evaluation (extension to Weka).
MLlib in Apache Spark - Distributed machine learning library in Spark
Neuroph - Neuroph is lightweight Java neural network framework
ORYX - Simple real-time large-scale machine learning infrastructure.
RankLib - RankLib is a library of learning to rank algorithms
RapidMiner - RapidMiner integration into Java code
Stanford Classifier - A classifier is a machine learning tool that will take data items and place them into one of k classes.
WalnutiQ - object oriented model of the human brain
Weka - Weka is a collection of machine learning algorithms for data mining tasks
Source: Awesome Machine Learning
For a novice to machine learning, what are the learning prerequisites to using Apache Mahout in an efficient way?
I know that a committer to Mahout would need calculus, linear algebra, probability and machine learning before they can contribute anything useful. But does a "User" of Apache Mahout need all of this?
I'm asking this because learning/revising all of the above would take me ages..
Mahout In Action provides a good overview of what you need to know to use Mahout.
Typically, scalable machine learning does not require advanced mathematics for use. It may require serious math to develop, but not necessarily to use.
The primary requirement is that you really understand your data and its origins and what you want to do with it. That understanding doesn't have to come all at once and can be developed over time.
Try to Google the topics below:
Programming Collaborative Intelligence
Similarity calculation with vectors
What's the different between cluster and classification.