what machine learning algorithm could be better for this scenario - machine-learning

I have a dataset comprised of roughly 15M observations, with approximately 3% of it being from the interest class. I can train the model in a pc, but i need to implement the classifier in a raspberry pi3. Since the raspberry has such a limited memory, what algorithms represent the least load for it?.
Additional info: the dataset is hard to differentiate. For example, ANNs can't get past the 80% detection rate for the interest class, no matter the architecture or activation function. Random forest has demonstrated great performance but the number of trees and nodes required aren't feasible for the implementation on a microcontroller.
Thank you, in advance.

You could potentially trim the trees in Random Forest approach so that to balance the classifier performance with memory / processing power requirements.
Also, I am suspecting you have a strongly imbalanced train/test sets so I wonder if you used any of the approaches suggested in this case (e.g. SMOTE, ADASYN, etc.). In case of python I strongly suggest reviewing imbalanced-learn library. Using such an approach could lead to a reduced size of classifier with acceptably good performance that you would be able to fit to run on the target device.
Last but not least, this question could easily go to Cross Validated or Data Science sites.

Related

Do GPUs have a noticeable improvement in ML predictions? (not training)

I interested in improving performance with regard to ML predictions. (I don't care about training)
-Will GPUs provide more throughput or lower latency?
-Are they good for batch or online serving?
-What types of models would be most impact by using GPUs?
Disclaimer: The real answer is "it depends; if such a decision is important to you, you should benchmark CPU performance against GPU performance on your target systems to make an informed decision." The rest of this answer is just advice to loosely guide your decision when you don't want to (or don't have time to) do any benchmarking.
In a research environment, predictions are often (though not always) done in batches. As such, even if the model is entirely serial (i.e. there is an execution dependency between every pair of operations), it will likely still benefit from parallelization in that those serial operations may have to be replicated for multiple query points simultaneously, and so you can parallelize predictions across query points within a batch. So if your prediction setting involves batches, you should pretty much always use a GPU. From my own research experiences, a GPU is always faster than a CPU in batched prediction settings, regardless of the model used.
If you are only making a single prediction at a time (e.g. an "online" prediction setting), most modern ML methods are still highly parallelizable in general. In a neural network, for instance, there are only execution dependencies between layers; there are no execution dependencies between nodes within a layer. If you have many nodes per layer (which most modern deep learning architectures do), then your model is likely very parallelizable and can benefit from using a GPU instead of a CPU.
Naive Bayes classifiers make predictions by computing a bunch of (supposedly) conditionally independent probabilities, which can be parallelized, and then multiplying them together, which can be parallelized via reduction. As such, they may also benefit from using a GPU instead of a CPU.
For a support vector machine with the dual problem approach, making a prediction requires computing an inner product (kernel trick) for each training data point with the query point, and multiplying each inner product by the corresponding parameters and target binary labels. This can very easily be parallelized in a similar way to naive Bayes classifiers.
The list goes on. The point is, most ML methods are at least relatively conducive to parallelization even if you're processing a single query point at a time, and extremely conducive to parallelization if you're processing query points in batch. This makes them generally run faster on the "average" GPU than the "average" CPU.
But ultimately, it depends on your model and target system, so if it matters that much to you, you should benchmark to make an informed decision.

why using support vector machine?

I have some questions about SVM :
1- Why using SVM? or in other words, what causes it to appear?
2- The state Of art (2017)
3- What improvements have they made?
SVM works very well. In many applications, they are still among the best performing algorithms.
We've seen some progress in particular on linear SVMs, that can be trained much faster than kernel SVMs.
Read more literature. Don't expect an exhaustive answer in this QA format. Show more effort on your behalf.
SVM's are most commonly used for classification problems where labeled data is available (supervised learning) and are useful for modeling with limited data. For problems with unlabeled data (unsupervised learning), then support vector clustering is an algorithm commonly employed. SVM tends to perform better on binary classification problems since the decision boundaries will not overlap. Your 2nd and 3rd questions are very ambiguous (and need lots of work!), but I'll suffice it to say that SVM's have found wide range applicability to medical data science. Here's a link to explore more about this: Applications of Support Vector Machine (SVM) Learning in Cancer Genomics

One particular dataset is benchmark for image classification in the computer vision and machine learning literature

Whenever I read articles about dataset (like MNIST and CIFAR 10), mostly I find statement like,
CIFAR-10 is standard benchmark dataset for image classification in the computer vision and machine learning literature.
If especially, I'll talk about the convolution neural network (deep learning) then, as per knowledge, the accuracy of the network architecture depends on dataset used.
I am really confused with the statement I did bold above.
What does exactly that statement mean?
Thanks in advance if someone can help me by giving real life example.
In this context, "benchmarking" has been defined on Wikipedia as:
In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it.
In the computer vision community, some researchers focus their efforts on improving image classification performance on benchmark datasets. There are many benchmark datasets, e.g. MNIST, CIFAR-10, ImageNet. The existence of benchmark datasets means researchers can more easily compare the performance of their proposed method against existing methods. The researchers know that (generally speaking) everyone who benchmarks their method on this dataset has access to the same input data.
In some cases, a dataset provider keeps a leaderboard of the best-performing results on their benchmark dataset. For example, ImageNet keeps a record of the performance of the methods submitted to the benchmark contest each year. Here is an example of the best-performing methods for the 2017 ImageNet object classification task.

Should the neurons in a neural network be asynchronous?

I am designing a neural network and am trying to determine if I should write it in such a way that each neuron is its own 'process' in Erlang, or if I should just go with C++ and run a network in one thread (I would still use all my cores by running an instance of each network in its own thread).
Is there a good reason to give up the speed of C++ for the asynchronous neurons that Erlang offers?
I'm not sure I understand what you're trying to do. An artificial neural network is essentially represented by the weight of the connections between nodes. The nodes themselves don't exist in isolation; their values are only calculated (at least in feed-forward networks) through the forward-propagation algorithm, when it is given input.
The backpropagation algorithm for updating weights is definitely parallelizable, but that doesn't seem to be what you're describing.
The usefulness of having neurons in a Neural Network (NN), is to have a multi-dimension matrix which coefficients you want to handle ( to train them, to change them, to adapt them little by little, so as they fit well to the problem you want to solve). On this matrix you can apply numerical methods (proven and efficient) so as to find an acceptable solution, in an acceptable time.
IMHO, with NN (namely with back-propagation training method), the goal is to have a matrix which is efficient both at run-time/predict-time, and at training time.
I don't grasp the point of having asynchronous neurons. What would it offers ? what issue would it solve ?
Maybe you could explain clearly what problem you would solve putting them asynchronous ?
I am indeed inverting your question: what do you want to gain with asynchronicity regarding traditional NN techniques ?
It would depend upon your use case: the neural network computational model and your execution environment. Here is a recent paper (2014) by Plotnikova et al, that uses "Erlang and platform Erlang/OTP with predefined base implementation of actor model functions" and a new model developed by the authors that they describe as “one neuron—one process” using "Gravitation Search Algorithm" for training:
http://link.springer.com/chapter/10.1007%2F978-3-319-06764-3_52
To briefly cite their abstract, "The paper develops asynchronous distributed modification of this algorithm and presents the results of experiments. The proposed architecture shows the performance increase for distributed systems with different environment parameters (high-performance cluster and local network with a slow interconnection bus)."
Also, most other answers here reference a computational model that uses matrix operations for the base of training and simulation, for which the authors of this paper compare by saying, "this case neural network model [ie matrix operations based] becomes fully mathematical and its original nature (from neural networks biological prototypes) gets lost"
The tests were run on three types of systems;
IBM cluster is represented as 15 virtual machines.
Distributed system deployed to the local network is represented as 15 physical machines.
Hybrid system is based on the system 2 but each physical machine has four processor cores.
They provide the following concrete results, "The presented results evidence a good distribution ability of gravitation search, especially for large networks (801 and more neurons). Acceleration depends on the node count almost linearly. If we use 15 nodes we can get about eight times acceleration of the training process."
Finally, they conclude regarding their model, "The model includes three abstraction levels: NNET, MLP and NEURON. Such architecture allows encapsulating some general features on general levels and some specific for the considered neural networks features on special levels. Asynchronous message passing between levels allow to differentiate synchronous and asynchronous parts of training and simulation algorithms and, as a result, to improve the use of resources."
It depends what you are after.
2nd Generation of Neural Networks are synchronous. They perform computations on an input-output basis without a delay, and can be trained either through reinforcement or back-propagation. This is the prevailing type of ANN at the moment and the easiest to get started with if you are trying to solve a problem via machine learning, lots of literature and examples available.
3rd Generation of Neural Networks (so-called "Spiking Neural Networks") are asynchronous. Signals propagate internally through the network as a chain-reaction of spiking events, and can create interesting patterns and oscillations depending on the shape of the network. While they model biological brains more closely they are also harder to make use of in a practical setting.
I think that async computation for NNs might prove beneficial for the (recognition) performance. In fact, the result might be similar (maybe less pronounced) to using dropout.
But a straight-forward implementation of async NNs would be much slower, because for synchronous NNs you can use linear algebra libraries, which make good use of vectorization or GPUs.

How easy/fast are support vector machines to create/update?

If I provided you with data sufficient to classify a bunch of objects as either apples, oranges or bananas, how long might it take you to build an SVM that could make that classification? I appreciate that it probably depends on the nature of the data, but are we more likely talking hours, days or weeks?
Ok. Now that you have that SVM, and you have an understanding of how the data behaves, how long would it likely take you to upgrade that SVM (or build a new one) to classify an extra class (tomatoes) as well? Seconds? Minutes? Hours?
The motivation for the question is trying to assess the practical suitability of SVMs to a situation in which not all data is available to be sampled at any time. Fruit are an obvious case - they change colour and availability with the season.
If you would expect SVMs to be too fiddly to be able to create inside 5 minutes on demand, despite experience with the problem domain, then suggestions of a more user-friendly form of classifier for such a situation would be appreciated.
Generally, adding a class to a 1 vs. many SVM classifier requires retraining all classes. In case of large data sets, this might turn out to be quite expensive. In the real world, when facing very large data sets, if performance and flexibility are more important than state-of-the-art accuracy, Naive Bayes is quite widely used (adding a class to a NB classifier requires training of the new class only).
However, according to your comment, which states the data has tens of dimensions and up to 1000s of samples, the problem is relatively small, so practically, SVM retrain can be performed very fast (probably, in the order of seconds to tens of seconds).
You need to give us more details about your problem, since there are too many different scenarios where SVM can be trained fairly quickly (I could train it in real time in a third person shooting game and not have any latency) or it could last several minutes (I have a case for a face detector that training took an hour long)
As a thumb rule, the training time is proportional to the number of samples and the dimension of each vector.

Resources