Applying machine learning techniques, more specifically text mining techniques, in browser environment (mainly Javascript) or as a web application is not a very widely discussed topic.
I want to build my own web application / browser extension that can accomplish certain level of text classification / visualization techniques. I would like to know, if there is any open source projects that apply text mining techniques in web application or even better as browser extensions?
So far, these are the projects/discussions I gathered with days of random searching:
For text mining in web application:
http://text-processing.com/ with demo (Close source, with limited api)
uClassify (close source, no info about library base)
For machine learning in Javascript:
Discussion on the possibility about Machine learning in
JavaScript. (mainly about saying Node.js is going to change the landscape)
brain - javascript supervised machine learning
A demo project with Naive Bayes implemented in Javascript
For web application text mining, the architect that I can think of:
Python libraries (e.g. NLTK or scikit-learn) + Django
Java libraries (a lot) + Play! framework
Even R based + rApache
Some popular machine learning libraries:
Python - PyBrain
Apache - Mahout
I'll give you my favourities:
Brain.js
ConvNetJS
It has been 7 years since this question was asked, but there is a chance that machine learning will get native browsers support: https://webmachinelearning.github.io/
(just make sure you like posts in github issues about adding training capabilities, otherwise you might end up only with some 3rd party models support :-) )
Related
I am new to the field of machine learning, I am planning to use python as the programing language for implementing algorithms and Java for system architecture.
As far as I understand, machine learning is more about modeling data specific to the domain, visualize the data, and choose appropriate models & parameters. Implementing the models/algorithms is the last and relatively easy step.
Matlab seems to have everything for machine learning but it is too expensive and requires to learn a new language.
What tools other than programming language do I need in general for machine learning for enterprise projects? things like data modeling, visualization,etc
After a couple of years of trial and error, I would suggest you to go directly with python, possibly with scikit-learn or tensorflow (if you want to go hardcore :).
I also tried R in the past, and while it is a very valid language it has some limitations: It is single threaded by default, and although there are solutions for that, they are non as clean as python.
Also, python seems to be THE language for machine learning, it is easy to learn, and fast (depending on the interpreter implementation of course), also there is huuuuuuge support for it, lots of tutorials, documentation and, more important, libraries are actively develop and supported.
Finally, i recommend you to consider spyder as a good IDE for data science, I also tried Rodeo, but it does not seem as mature and stable as spyder.
Hope this helps.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
Can somebody explain me the main pros and cons of the most known datamining open-source tools?
Everywhere I read that RapidMiner, Weka, Orange, KNIME are the best ones.
look at this blog post
Can somebody do a fast technical comparison in a small bullet list.
My needs are the following:
It should support classification algorithms (Naive Bayes, SVM, C4.5,
kNN).
It should be easy to implement in Java.
It should have understandable documentation.
It should have reference production projects or use cases working on in.
some additional benchmark comparison if possible.
Thanks!
I would like to say firstly there are pro's and cons for each of them on your list however I would suggest out of your list weka from my personal experience it is incredibly simple to implement in your own java application using the weka jar file and has its own self contained tools for data mining.
Rapid miner seems to be a commercial solution offering an end to end solution however the most notable number of examples of external implementations of solutions for rapid miner are usually in python and r script not java.
Orange offers tools that seem to be targeted primarily at people with possibly less need for custom implementations into their own software but a far easier time with user itneraction, its written in python and source is available, user addons are supported.
Knime is another commercial platform offering end to end solutions for data mining and analysis providing all the tools required, this one has various good reviews around the internet but i havent used it enough to advise you or anyone on the pro's or cons of it.
See here for knime vs weka
Best data mining tools
As i said weka is my personal favorite as a software developer but im sure other people have varying reasons and opinions on why to choose one over the other. Hope you find the right solution for you.
Also per your requirements weka supports the following:
Naivebayes
SVM
C4.5
KNN
I have tried Orange and Weka with a 15K records database and found problems with the memory management in Weka, it needed more than 16Gb of RAM while Orange could've managed the database without using that much. Once Weka reaches the maximum amount of memory, it crashes, even if you set more memory in the ini file telling Java virtual machine to use more.
I recently evaluated many open source projects, comparing and contrasted them with regards to the decision tree machine learning algorithm. Weka and KNIME were included in that evaluation. I covered the differences in algorithm, UX, accuracy, and model inspection. You might chose one or the other depending on what features you value most.
I have had positive experience with RapidMiner:
a large set of machine learning algorithms
machine learning tools - feature selection, parameter grid search, data partitioning, cross validation, metrics
a large set of data manipulation algorithms - input, transformation, output
applicable to many domains - finance, web crawling and scraping, nlp, images (very basic)
extensible - one can send and receive data other technologies: R, python, groovy, shell
portable - can be run as a java process
developer friendly (to some extent, could use some improvements) - logging, debugging, breakpoints, macros
I would have liked to see something like RapidMiner in terms of user experience, but with the underlying engine based on python technologies: pandas, scikit-learn, spacy etc. Preferably, something that would allow moving back and forth from GUI to code.
Are there any machine learning packages that implement spiking neural networks? or any other stand-alone implementations of them that could get me started to work with?
A python library named Brian ought to be useful for you.
There's also what I believe is a programing language named NEURON, but Brian is fairly easy to learn, at least for the basics. It took me a while though to figure out how to do a couple small things, since its a really high level language or whatnot.
There are several other SNN platforms these days that allows you to run classification. I have worked with NeuCube (https://kedri.aut.ac.nz/R-and-D-Systems/neucube) which is a Matlab & Java-based SNN platform.
Also, check out Akida Development Environment (ADE) from Brainchip Inc (https://brainchipinc.com/). One of the best features of ADE is that it's APIs are based on tensorflow/keras structure and also supports CNN2SNN converter to use your deep learning models in SNN domain. SNN models developed using this platform can be deployed on their neuromorphic processor Akida.
I believe there are other platforms such as PyNN and Nengo (compatibility to run models on Loihi) within the SNN domain.
Here are links for brain simulator
https://github.com/brian-team/brian2
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2605403/
http://briansimulator.org/
You can install the Nengo Loihi library for deployment not only of spiking neural networks but also neuromorphic neural networks.
here's the link to their website: https://www.nengo.ai/nengo-loihi/v1.0.0/index.html
You can find on Kaggle an implementation of the ciphar10 dataset, locally loaded, using Nengo Loihi library. Here's the link:
https://www.kaggle.com/migueltoms/neuromorphic-ciphar-10-loihi-comparison-of-results
I'm working on a project where I train a text classifier and I need to create a web app to let the user enter text for classification. Currently all the code is written in Python and I'm using scikit-learn library. I've encountered a problem installing the scikit-learn on heroku, in order for my Python code to run on the server. I don't mind changing everything (Python language, Flask web framework, scikit learning library, heroku web-app hosting services), I just need to get this thing to work :)
Do any one of you in CV community had any experience in making a web-app that uses a learning library online? The web app hosting should be a free one though, as this project is not commercial, and also it would be very nice to have Python behind the scenes.
N.B. The classifiers that should be supported by the library are multiclass svm and naive bayes.
How about trying google app engine? It has python (2.5 and 2.7) and can be free.
I have been working on crawling webpages and extracting the elements of the website.
Ex:
Given a website - The crawler should return the following sections: Header, Menu, Footer, content etc.
I was thinking that it would be great if I could use machine learning to train the code to learn how to classify websites.
I tried looking at Python Machine learning libraries (ex: PyBrain) but the examples are very complex.
Can anyone please suggest me a library and some tutorial on how to get started on using Python Machine Learning with some simple examples?
Thanks!
MLPy may be a simpler start for you.
Here is a link to the documentation on classification. By the way, if you don't know what the classes should look like, maybe you need to cluster your pages, and not to classify them.