RedisML: More complex Machine Learning models? - machine-learning

I am looking for the use of RedisML for more complex machine learning models like, XGBoost and LSTM. Is there any plan from the RedisML team to incorporate these complex ML module in there modelu.so?

Related

Does AutoML accept external models?

I used random search and got the best hyper parameters for my model, can I pass that model to the AutoML?
Does AutoML do the random search for the best hyper parameters by itself? or is there something I need to pass?
I presume you're referring to Google Cloud AutoML. It is a cloud-based Machine Learning (ML) platform that suggests a no-code approach to building data-driven solutions. AutoML was designed to build custom models for both newcomers and experienced machine learning engineers.
For newcomers, you could use Vertex AI (fully automated) to build a ML model:
For experienced ML engineers, you could also use AutoML Tabular to build a custom model, with the ability to select a model and input the selected hyperparameters:
You can read more details from here

building a predictive function - is this function actually "Prediction" or is it "Classification"?

I found the following statement confusing. This statement upends my basic understanding of predictive machine learning.
"By not thinking probabilistically, machine learning advocates frequently utilize classifiers instead of using risk prediction models."
While currently learning about cross-validation, I'm learning to create predictive functions that utilize predictive features. Cross validation then predicts how well the predictive function will work on the testing dataset.
How are "classifiers" and "predictors" not the same things?

Azure ML - Train a model on segments of the data-set

I could really use some help!
The company I work for is made up of 52 very different businesses so I can't predict at the company level but instead need to predict business by business then roll up the result to give company wide prediction.
I have written an ML model in studio.azureml.net
It works great with a 0.947 Coefficient of Determination, but this is for 1 of the businesses.
I now need to train the model for the other 51.
Is there a way to do this in a single ML model rather than having to create 52 very similar models?
Any help would be much appreciated !!!
Kind Regards
Martin
You can use Ensembles, combining several models to improve predictions. The most direct is stacking when the outputs of all the models are trained on the entire dataset.
The method that, I think, corresponds the best to your problem is bagging (bootstrap aggregation). You need to divide the training set into different subsets (each corresponding to a certain business), then train a different model on each subset and combine the result of each classifier.
Another way is boosting but it is difficult to implement in Azure ML.
You can see an example in Azure ML Gallery.
Quote from book:
Stacking and bagging can be easily implemented in Azure Machine
Learning, but other ensemble methods are more difficult. Also, it
turns out to be very tedious to implement in Azure Machine Learning an
ensemble of, say, more than five models. The experiment is filled with
modules and is quite difficult to maintain. Sometimes it is worthwhile
to use any ensemble method available in R or Python. Adding more
models to an ensemble written in a script can be as trivial as
changing a number in the code, instead of copying and pasting modules
into the experiment.
You may also have a look at sklearn (Python) and caret (R) documentation for further details.

Recurrent Self Organizing Maps in Encog for Unsupervised Clustering with Context

Machine Learning - what a hoot!
I have a little project with which I would like to identify anomalies in unlabeled data. Thus, unsupervised clustering.
However, the sequence of the data is also important, as a single record may not be of interest, but the sequence of records that precede it may make it anomalous.
So I am thinking of building a Recurrent SOM to add the temporal context.
I have trained a few simple Machine Learning Models using Python Graphlab Create, Azure Machine Learning and Encog ML Framework, but Azure does not seem to provide unsupervised clustering and I am leaning towards using Encog.
I have looked at Recurrent Neural Networks in Encog, as well as SOM, but I have no idea how to combine the two. Most of the articles online regarding Feedback/Recurrent SOM Machine Learning are mostly academic.
Are there any good references for doing this with Encog?
A google search found only one good answer for RSOM in Encog: https://github.com/leadtune/encog-java/blob/master/encog-core/src/org/encog/neural/pattern/RSOMPattern.java

Machine-Learning - Concept / Recommendations

Hi I'm new at machine learning and therefore looking for a text classification solution. Could one recommend me a nice framework written in java? I thought about using WEKA, but also heard about MALLET. What's better, where are the main differences?
My target is to classify unlabeled text. Therefore I prepared about 18 topics and 100 text for each topic for learning.
What would you recommend to do? Would also appreciate a nice little example or hint of how to proceed.
You have a very minimal text data set, you could use any library - it wouldn't really matter. More advanced options would require more data than you have to be meaningful, so its not an issue worth considering. The simple way text classifications problems are handled is to use a Bag of Words model and a linear classifier. Both Weka and MALLET support this.
Personally, I find Weka to be a pain and MALLET to be poorly documented / out of date when it is, so I use JSAT. There is an example on doing spam classification here.
(bias warning, I'm the author of JSAT).
Since your task is fairly simple and as you mentioned you're new at ML, I'd recommend you to use weka as it is easy to use and has a large user community.
Otherwise here are some General Purpose Machine Learning frameworks in Java that you can have a look at:
Datumbox - Machine Learning framework for rapid development of Machine Learning and Statistical applications
ELKI - Java toolkit for data mining. (unsupervised: clustering, outlier detection etc.)
H2O - ML engine that supports distributed learning on data stored in HDFS.
htm.java - General Machine Learning library using Numenta’s Cortical Learning Algorithm
java-deeplearning - Distributed Deep Learning Platform for Java, Clojure,Scala
JAVA-ML - A general ML library with a common interface for all algorithms in Java
JSAT - Numerous Machine Learning algoirhtms for classification, regresion, and clustering.
Mahout - Distributed machine learning
Meka - An open source implementation of methods for multi-label classification and evaluation (extension to Weka).
MLlib in Apache Spark - Distributed machine learning library in Spark
Neuroph - Neuroph is lightweight Java neural network framework
ORYX - Simple real-time large-scale machine learning infrastructure.
RankLib - RankLib is a library of learning to rank algorithms
RapidMiner - RapidMiner integration into Java code
Stanford Classifier - A classifier is a machine learning tool that will take data items and place them into one of k classes.
WalnutiQ - object oriented model of the human brain
Weka - Weka is a collection of machine learning algorithms for data mining tasks
Source: Awesome Machine Learning

Resources