Is there a branch of SKLearn with robust parallelism? - machine-learning

The SKLearn machine learning library is extremely fragile when using it's built-in parallelism (setting n_jobs=-1). Are there patches, branches, or alternatives here?
Examples of Multicore fragility:
https://github.com/scikit-learn/scikit-learn/issues/5115
https://github.com/joblib/joblib/issues/138
scikit-learn's GridSearchCV stops working when n_jobs>1

Related

parallelized deep reinforcement learning

I am try to run DRL on a low speed environment and sequential learning is making me upset. is there anyway to speed up the learning process? I tried some offline deep reinforcement learning but I still need higher speed (if possible).
You are looking for Vectorized Environments. They will allow parallel interaction with your environments.

Is there a native library written in Julia for Machine Learning?

I have started using Julia.I read that it is faster than C.
So far I have seen some libraries like KNET and Flux, but both are for Deep Learning.
also there is a command "Pycall" tu use Python inside Julia.
But I am interested in Machine Learning too. So I would like to use SVM, Random Forest, KNN, XGBoost, etc but in Julia.
Is there a native library written in Julia for Machine Learning?
Thank you
A lot of algorithms are just plain available using dedicated packages. Like BayesNets.jl
For "classical machine learning" MLJ.jl which is a pure Julia Machine Learning framework, it's written by the Alan Turing Institute with very active development.
For Neural Networks Flux.jl is the way to go in Julia. Also very active, GPU-ready and allow all the exotics combinations that exist in the Julia ecosystem like DiffEqFlux.jl a package that combines Flux.jl and DifferentialEquations.jl.
Just wait for Zygote.jl a source-to-source automatic differentiation package that will be some sort of backend for Flux.jl
Of course, if you're more confident with Python ML tools you still have TensorFlow.jl and ScikitLearn.jl, but OP asked for pure Julia packages and those are just Julia wrappers of Python packages.
Have a look at this kNN implementation and this for XGboost.
There are SVM implementations, but outdated an unmaintained (search for SVM .jl). But, really, think about other algorithms for much better prediction qualities and model construction performance. Have a look at the OLS (orthogonal least squares) and OFR (orthogonal forward regression) algorithm family. You will easily find detailed algorithm descriptions, easy to code in any suitable language. However, there is currently no Julia implementation I am aware of. I found only Matlab implementations and made my own java implementation, some years ago. I have plans to port it to julia, but that has currently no priority and may last some years. Meanwhile - why not coding by yourself? You won't find any other language making it easier to code a prototype and turn it into a highly efficient production algorithm running heavy load on a CUDA enabled GPGPU.
I recommend this quite new publication, to start with: Nonlinear identification using orthogonal forward regression with nested optimal regularization

Using Machine Learning for Price Prediction

What Machine Learning Method should i Use to predict Prices like Stocks,gold and etc?
I Prefer using Python but I Can't Find the Starting Point as it Seems so Complicated to me and I've no Clue How to Start it.
Talking about the machine learning method, Regression Method is used for Price prediction as it is used to predict a continuous variable. There are wide range of techniques for regression in machine learning. Starting from simple linear regression, SVR, RandomForest, CatBoost to RNN. Based on target problem, available datasets and computing resources, one of the algorithms can be used.
Yes, Python is the best language to get started into machinbre learning. And definitely, Linear Regression is the best way to start for this regression task if you are new. Gradually, you can start exploring other techniques in scikit-learn before directly jumping into RNN. Scikit-learn is the best machine learning library from beginners to professionals.

Why use TensorFlow for Convolutional Neural Networks

I recently took a courser by Andrew Ng on Coursera. After that I shifted to Python and used Pandas, Numpy, Sklearn to implement ML algorithms. Now while surfing I came across tensorFLow and found it pretty amazing, and implemented this example which takes MNIST data as input.
But I am unsure why use such as library(TensorFlow)?
We are not doing any parallel calculations, since the weights updated in the previous epoch are used in the next one???
I am finding it difficult to find a reason to use such a Library?
There are several forms of parallelism that TensorFlow provides when training a convolutional neural network (and many other machine learning models), including:
Parallelism within individual operations (such as tf.nn.conv2d() and tf.matmul()). These operations have efficient parallel implementations for multi-core CPUs and GPUs, and TensorFlow uses these implementations wherever available.
Parallelism between operations. TensorFlow uses a dataflow graph representation for your model, and where there are two nodes that aren't connected by a directed path in the dataflow graph, these may execute in parallel. For example, the Inception image recognition model has many parallel branches in its dataflow graph (see figure 3 in this paper), and TensorFlow can exploit this to run many operations at the same time. The AlexNet paper also describes how to use "model parallelism" to run operations in parallel on different parts of the model, and TensorFlow supports that using the same mechanism.
Parallelism between model replicas. TensorFlow is also designed for distributed execution. One common scheme for parallel training ("data parallelism") involves sharding your dataset across a set of identical workers, performing the same training computation on each of those workers for different data, and sharing the model parameters between the workers.
In addition, libraries like TensorFlow and Theano can perform various optimizations when they can work with the whole dataflow graph of your model. For example, they can eliminate common subexpressions, avoid recomputing constant values, and generate more efficient fused code.
You might be able to find pre-baked models in sklearn or other libraries, but TensorFlow allows for really fast iteration of custom machine learning models. It also comes with a ton of useful functions that you would have to (and probably shouldn't) write yourself.
To me, it's less about performance (though they certainly care about performance), and more about whipping out neural networks really quickly.

How does the SHOGUN Toolbox convolutional neural network compare to Caffe and Theano?

I'm interested in implementing a convolutional neural network in my C++ program where I'm tracking tagged insects (I'm also using OpenCV). I see people mention Caffe, Torch and Theano a lot but I haven't heard the CNN in the SHOGUN Toolbox discussed. Does this CNN work well and would anyone recommend it if you're working in C++? I've used Theano via scikit-neuralnetwork in Python to test out some images and that worked really well, except unfortunately Theano is Python-only.
Shogun also has GPU support of some of the operations used in the NN code. This is work in progress though. At this point in time, other libraries might be faster. We mostly built these networks in there in order to be able to easily compare them to the other algorithms in the toolbox.
The advantage, however, is that you can use it from a large number of languages (while internally, C++ code is executed) -- useful if you don't want to use python.
Here are some IPython notebooks that you could use as a basis to compare:
autoencoders for denoising and classification
(convolution) networks for digit classification
We appreciate any experience to be shared. Shogun is in constant development and especially the NNs attract a lot of people to work on them, so expect things to change. If you are interested in helping GPU-fying Shogun, please let us know.
The difference lies in the speed. cnn is computationally expensive, so a GPU implementation is at least 10 times faster than CPU. caffe and theano provide seamless integration of calling either CPU or GPU, which may not be easy for you to implement without much GPU programming experience.
Other factors may exist including a unified interface for multiplayer, stochastic gradient descent, and etc. but I think speed issue is most crucial among all these factors.

Resources