MySQLJDBCDataModel is missing in mahout 0.9 - mahout

I am trying to use DataModel that reads from mysql. Unfortunately the class MySQLJDBCDataModel is missing in mahout 0.9. What is the alternative?

Related

How to export/save/load the actual AutoKeras "super" model, not the underlying tensorflow model

Is there a way to export/save/load a previously trained autokeras model? I understand I can use the following code to save/load the underlying tensorflow best model:
model = reg.export_model()
model.save(MODEL_FILEPATH, save_format="tf")
best_model = load_model(MODEL_FILEPATH, custom_objects=ak.CUSTOM_OBJECTS)
However, in practice that wouldn't work, since my data has been fitted by autokeras, which takes care of data preparation and scaling. I don't think I have access to what autokeras is doing to the input data (X) before actually fitting, so I can't actually use the exported tensorflow best model to predict labels for new samples with un-prepared and unscaled features.
Am I missing something major here?
Also I noticed that there are some binaries in the autokeras temporary dir. That dir seems to be generated automatically. Is there a way to use that dir to load the previously-fit autokeras "super" model?
Just using import pickle will do the job - https://github.com/keras-team/autokeras/issues/1081#issuecomment-645508111 :

OpenCV random forest: Setting a random seed

Since there is randomness involved in the computation of a random forest classifier, it is necessary to define a random seed to get reproducible results. How does one do this for OpenCV CvRTrees? I do not see such a parameter in CvRTParams.
Update: The API change of OpenCV 3 removed CvRTParams. However, the title question remains.
it depends on the opencv version you are using.
While 2.4.9 seems to use the global cv::theRNG() , where you can just set theRNG().state = something,
This no longer seems to be possible in opencv3.0

OpenCV - training new LatentSVMDetector Models

I haven't found any method to train new latent svm detector models using openCV. I'm currently using the existing models given in the xml files, but I would like to train my own.
Is there any method for doing so?
Thank you,
Gil.
As of now only DPM-detection is implemented in OpenCV, not training.
If you want to train your own models, the most reliable approach is to use Felzenszwalb's and Girshick's matlab code (most of the heavy stuff is implemented in C) (http://www.cs.berkeley.edu/~rbg/latent/)(http://www.rossgirshick.info/latent/) It is reliable and works reasonably fast
If you want to do it in C-only, there is an implementation here (http://libccv.org/doc/doc-dpm/) that I haven't tried myself.
I think there is a function in the octave version of the author's code here
(Octave Version of DPM). It is in step #5,
mat2opencvxml('./INRIA/inriaperson_final.mat', 'inriaperson_cascade_cv.xml');
I will try it and let you know about the result.
EDIT
I tried to convert the .mat file from the octave version i mentioned before to .xml file, and compared the result with the built in opencv .xml model and the construction of the 2 xmls was different (tags, #components,..), it seems that this version of octave dpm generates xml files for later opencv version (i am using 2.4).
VOC-release3.1 is the one matches opencv2.4.14. I tried to convert the already trained model from this version using mat2xml function available in opencv and the result xml file is successfully loaded and working with opencv. Here are some helpful links:
mat2xml code
VOC-release-3.1
How To Train DPM on a New Object

How to deal with frequent classes?

I'm working on a classification task in Weka and got the problem that my class to predict has one value that is very frequent (about 85%). This leads to a lot of learning algorithms just predicting this frequent value of this class for a new dataset.
How can I deal with this problem? Does it just mean that I didn't find features that work well enough in predicting something better? Or is there something specific I can do to solve this problem?
I guess this is a pretty common problem, but I was not able to find a solution to it here.
You need to "SMOTE" your data. First figure out how many more instances of the minority case you need. In my case I wanted to get around a 50/50 ratio so I needed to over sample by 1300 percent. This tutorial will help if you are using the GUI: http://www.youtube.com/watch?v=w14ha2Fmg6U If you are doing this from the command line using Weka, the following command will get you going:
#Weka 3.7.7
java weka.Run -no-scan weka.filters.supervised.instance.SMOTE \
-c last -K 25 -P 1300.0 -S 1 -i input.arff -o output.arff
The -K option is the number of neighbors to take into account when smoting the data. The default is 5, but 25 worked best for my dataset.

How to reavaluate model in WEKA?

I am trying to solve a numeric classification problem with numeric attributes in WEKA using linear regression and then I want to test my model on the existing dataset with ""re-evaluate model on current test dataset.
As a result of the evaluation I am getting the summary:
Correlation coefficient 0.9924
Mean absolute error 1.1017
Root mean squared error 1.2445
Total Number of Instances 17
But I don't have results as it is shown here: http://weka.wikispaces.com/Making+predictions
How to bring WEKA to the result I need?
Thank you.
To answer my question - for trained and tested model, right click on the model and go to visualize classifier error. there use save option to save actual and predicted values.
Are you using command line interface (CLI) or GUI.
If CLI, the command given in the above link works pretty fine
java weka.classifiers.trees.J48 -T unclassified.arff -l j48.model -p 0
So when you train the model you save it as *.model (j48.model) and later use it to evaluate on test data (unclassified.arff)

Resources