Training a Neural network model on KMeans Clusters - machine-learning

I am classifying a client's clients. However, the data is fluid and the clusters can change every day.
Running new clusters daily to update user clusters is difficult because Kmeans is inconsistent in labeling clusters.
If we cluster, and then train the data say with a Neural networks or XGBoost and moving forward simply predict the clusters. Does this make sense or is it a good way to do things?

Yeah, it does make sense, it's just a regular classification task at that point. You should have enough data assigned to clusters though before moving on to neural network.
On the other hand, why don't you predict clusters for new points instead of updating them (you can see separate methods for fit and predict in sklearn's docs, though it depends on technology you are using)? Remember, that neural network will only be as good as it's input (K-Means clusters) and it's predictions will probably be similiar to K-Means.
Furthermore, NNs are more complicated and harder to train, maybe those shouldn't be you first choice.
You could check the idea of fuzzy clustering as well, as the data is fluid it might be a better fit for your case. Maybe autoencoders, as a method of obtaining latent variables, might be of use as well.

Related

Facial recognition and classifying unknowns with neural networks

As far as I understand, neural networks aren't good at classifying 'unknowns', i.e. items that do not belong to a learned class. But how do face detection/recognition approaches usually determine that no face is detected/recognised in a region? Is the predicted probability somehow thresholded?
Summary
It is true that neural networks are inherently not good at classifying 'unknowns' because they tend to overfit to the data that they have been trained on, if the underlying structure of the neural network is complex enough. However, there are multiple ways to go about reducing the affects of overfitting. For example, one technique that is used for this is called dropout. Another example can be batch normalization. Despite these techniques, the best way to reduce the affects of overfitting is to use more data.
For the facial recognition example that you have given above, it is common that the models that have been trained have 'seen' a huge amount of data. This means that there are very few 'unknowns' and even if there are, the neural network has learned how to tell if there are facial features present or not. This is because certain structures of neural networks are really good at telling if there is a pattern of features present in the input data. This helps the neural networks to learn if the image that is being input has certain features/patterns in it or not. If the these features are found then the input data is classified as face otherwise it is not.

Should the neurons in a neural network be asynchronous?

I am designing a neural network and am trying to determine if I should write it in such a way that each neuron is its own 'process' in Erlang, or if I should just go with C++ and run a network in one thread (I would still use all my cores by running an instance of each network in its own thread).
Is there a good reason to give up the speed of C++ for the asynchronous neurons that Erlang offers?
I'm not sure I understand what you're trying to do. An artificial neural network is essentially represented by the weight of the connections between nodes. The nodes themselves don't exist in isolation; their values are only calculated (at least in feed-forward networks) through the forward-propagation algorithm, when it is given input.
The backpropagation algorithm for updating weights is definitely parallelizable, but that doesn't seem to be what you're describing.
The usefulness of having neurons in a Neural Network (NN), is to have a multi-dimension matrix which coefficients you want to handle ( to train them, to change them, to adapt them little by little, so as they fit well to the problem you want to solve). On this matrix you can apply numerical methods (proven and efficient) so as to find an acceptable solution, in an acceptable time.
IMHO, with NN (namely with back-propagation training method), the goal is to have a matrix which is efficient both at run-time/predict-time, and at training time.
I don't grasp the point of having asynchronous neurons. What would it offers ? what issue would it solve ?
Maybe you could explain clearly what problem you would solve putting them asynchronous ?
I am indeed inverting your question: what do you want to gain with asynchronicity regarding traditional NN techniques ?
It would depend upon your use case: the neural network computational model and your execution environment. Here is a recent paper (2014) by Plotnikova et al, that uses "Erlang and platform Erlang/OTP with predefined base implementation of actor model functions" and a new model developed by the authors that they describe as “one neuron—one process” using "Gravitation Search Algorithm" for training:
http://link.springer.com/chapter/10.1007%2F978-3-319-06764-3_52
To briefly cite their abstract, "The paper develops asynchronous distributed modification of this algorithm and presents the results of experiments. The proposed architecture shows the performance increase for distributed systems with different environment parameters (high-performance cluster and local network with a slow interconnection bus)."
Also, most other answers here reference a computational model that uses matrix operations for the base of training and simulation, for which the authors of this paper compare by saying, "this case neural network model [ie matrix operations based] becomes fully mathematical and its original nature (from neural networks biological prototypes) gets lost"
The tests were run on three types of systems;
IBM cluster is represented as 15 virtual machines.
Distributed system deployed to the local network is represented as 15 physical machines.
Hybrid system is based on the system 2 but each physical machine has four processor cores.
They provide the following concrete results, "The presented results evidence a good distribution ability of gravitation search, especially for large networks (801 and more neurons). Acceleration depends on the node count almost linearly. If we use 15 nodes we can get about eight times acceleration of the training process."
Finally, they conclude regarding their model, "The model includes three abstraction levels: NNET, MLP and NEURON. Such architecture allows encapsulating some general features on general levels and some specific for the considered neural networks features on special levels. Asynchronous message passing between levels allow to differentiate synchronous and asynchronous parts of training and simulation algorithms and, as a result, to improve the use of resources."
It depends what you are after.
2nd Generation of Neural Networks are synchronous. They perform computations on an input-output basis without a delay, and can be trained either through reinforcement or back-propagation. This is the prevailing type of ANN at the moment and the easiest to get started with if you are trying to solve a problem via machine learning, lots of literature and examples available.
3rd Generation of Neural Networks (so-called "Spiking Neural Networks") are asynchronous. Signals propagate internally through the network as a chain-reaction of spiking events, and can create interesting patterns and oscillations depending on the shape of the network. While they model biological brains more closely they are also harder to make use of in a practical setting.
I think that async computation for NNs might prove beneficial for the (recognition) performance. In fact, the result might be similar (maybe less pronounced) to using dropout.
But a straight-forward implementation of async NNs would be much slower, because for synchronous NNs you can use linear algebra libraries, which make good use of vectorization or GPUs.

How to find dynamically the depth of a network in Convolutional Neural Network

I was looking for an automatic way to decide how many layers should I apply to my network depends on data and computer configuration. I searched in web, but I could not find anything. Maybe my keywords or looking ways are wrong.
Do you have any idea?
The number of layers, or depth, of a neural network is one of its hyperparameters.
This means that it is a quantity that can not be learned from the data, but you should choose it before trying to fit your dataset. According to Bengio,
We define a hyper-
parameter for a learning algorithm A as a variable to
be set prior to the actual application of A to the data,
one that is not directly selected by the learning algo-
rithm itself.
There are three main approaches to find out the optimal value for an hyperparameter. The first two are well explained in the paper I linked.
Manual search. Using well-known black magic, the researcher choose the optimal value through try-and-error.
Automatic search. The researcher relies on an automated routine in order to speed up the search.
Bayesian optimization.
More specifically, adding more layers to a deep neural network is likely to improve the performance (reduce generalization error), up to a certain number when it overfits the training data.
So, in practice, you should train your ConvNet with, say, 4 layers, try adding one hidden layer and train again, until you see some overfitting. Of course, some strong regularization techniques (such as dropout) is required.

What is suitable neural network architecture for the prediction of popularity of articles?

I am a newbie in machine learning and also in neural networks. Currently I'm taking a course at coursera.org about neural networks, but I don't understand everything. I have a little problem with my thesis. I should use a neural network, but I don't know how to choose the right neural network architecture for my problem.
I have a lot of data from web portals (typically online editions of newspapers, magazines). There is information about articles for example, name, text of article and release of article. There are also large amounts of sequence data that capture behavior of users.
My goal is to predict the popularity of an article (number of readers or clicks on article by unique user). I want to make vectors from this data and feed my neural network with these vectors.
I have two questions:
1. How do I create the right vector?
2. Which neural network architecture is best suited for this problem?
Those are very broad questions. You'll need to identify smaller issues if you want more exact answers.
How to create a right vector?
For text data, you usually use the vector space model. Best results are often obtained using tf-idf weighting.
Which neural network architecture is suitable for this problem?
This is very hard to say. I would start with a network with k input neurons (where k is the size of your vectors after applying tf-idf: you might also want to do some sort of feature selection to reduce the number of features. A good feature selection method is by using the chi squared test.)
Then, a standard network layout is given by using a single hidden layer with number of neurons equal to the average between the number of input neurons and output neurons. Then it looks like you only need a single output neuron that will output how popular the article is going to be (this can be a linear neuron or a sigmoid neuron).
For the neurons in your hidden layer, you can also experiment with linear and sigmoid neurons.
There are many other things you can try as well: weight decay, the momentum technique, networks with multiple layers, recurrent networks and so on. It's impossible to say what would work best for your given problem without a lot of experimentation.

cluster analysis? label the cluster

I am quite confused about following two problems:
I have a 15 dimensional dataset which should be used to cluster how many types of attacks are contained in the dataset.
1. now i have already clustered my dataset into 5 clusters (5 attacks). Does anyone know how can i point which cluster is which attack? (how to label the clusters not by just "cluster 1,cluster 2...")
2. In supervised classification, we have training dataset and testing dataset, and the testing is conducted with the classifier built from traning dataset. My question is, can the same approach be used for clustering. Like building a model with clustering algorithm, and then automatically classify the new instance into a specific cluster? Is this achievable?
How should an unsupervised method be able to identify named attacks?
The human-assigned name is not in the data!
For some clustering algorithms you can assign new instances automatically, but in general you cannot (not without knowing the model used by the clustering). In the worst case, a new observation would even e.g. merge two clusters into one. What are you going to do then?
If you want classification, use classification, not clustering.
Clustering has a very different mind-set. If you approach it from a classification point of view, you will not really understand it. You use clustering for finding something unknown in data, classification for generalizing something known to new data.
If necessary, you can also train a classifier on your cluster. But don't do this blindly. First make sure that the clusters actually are something useful. It's much easier to come up with a completely meaningless clustering result than with a good clustering. Training a classifier on worthless clusters won't produce a meaningful output.

Resources