I'm a beginner for RNN.
My current interest is to use LSTM to implement a time series classification. I have seen many examples such as MNIST classification.
However, I try implementing LSTM (using Tensorflow framework) to model a binary classification for temperature trend prediction (i.e. up or down).
The followings are some details on my experiment setting.
Dataset:
The dataset (Daily minimum temperatures in Melbourne) was obtained from here with 3650 observations. I divided the data into windows with length 10; therefore, this will produce 3640 instances. And their corresponding labels are one-hot vectors representing the trend of the next value of the last value of the window, i.e. [1, 0] or [0, 1] in case of increasing or decreasing respectively.
For example, the dataset contains
1984/7/30,10
1984/7/31,10.6
1984/8/1,11.5
1984/8/2,10.2
1984/8/3,11.1
1984/8/4,11
1984/8/5,8.9
1984/8/6,9.9
1984/8/7,11.7
1984/8/8,11.6
1984/8/9,9
1984/8/10,6.3
1984/8/11,8.7
1984/8/12,8.5
1984/8/13,8.5
1984/8/14,8
1984/8/15,6
...
Two possible training window would be
(1) [10, 10.6, 11.5, 10.2, 11.1, 11, 8.9, 9.9, 11.7, 11.6] and label: [0, 1],
(2) [10.6, 11.5, 10.2, 11.1, 11, 8.9, 9.9, 11.7, 11.6, 9] and label: [0, 1],
(3) ...
The problem: My problem is that the model produces the outputs with accuracy wiggling around 49-50% accuracy. So my question is "does it make sense to build a binary classification model having structure like above mentioned?".
Any help is appreciated.
Thanks.
Related
As I know, the deep learning model does the training for such inputs to produce an output, then that trained model will be used to predict the output based on the novel input which has the same length of input used in the training stage, also the predicted output has the same length of vector used in training stage. My concern is little bit different.
When having a deep learning model, assume the input to be a vector y with length N, and the output x with length M. Can the deep learning model do the training for the input vector y till some values of x are correct, then the deep learning model will be used to predict the other values of x? How can we do that process, I mean which process can be followed in that case?
For example, I have random vector y with size 50 x 1, and the output x a vector x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]. So can the deep learning use the vector y and do the training till have the first four values of x to be [0,1,2,3], then we let the trained model to predict the other values of the vector x. It is very important to mention there that values of the output vector depends on each other, so expecting part of them can yield the other values too.
I tried to follow the conventional way to do that, but I find that same size of input/outputs during the training and testing stages must be used while what I am looking for is little bit different.
I was wondering if you could train a neural network model where from a vector a parabolic shape values you could predict a scalar value.
For example :
let's say the input vector is [5, 10, 15, 20, 22, 25, 22, 15, 10, 5] then the ouput should be 23.
And to train it I just give the model lots of input vector like the one in the example and the values that should be returned for each one of these vectors.
I looked it up on the internet but didn't find anything that was matching my case but I'm a newbie in this so maybe I just don't understand certain algorithms.
I want to know what to do after I did the binning. For example, one of the feature is age. So my data is [11, 12, 35, 26].
Then I apply binning with size of 10:
bin, name
[0, 10) --> 1
[10, 20) --> 2
[20, 30) -->3
[30, 40) --> 4
Then my data becomes [2, 2, 4, 3]. Now assume I want to put this data to a linear regression mode. Should I treat the [2, 2, 4, 3] as numerical feature? Or should I treat them as categorical feature, like do one-hot encoding first and then feed it to the model?
If you are building a linear model, then one hot encoding of those bins might be a better option, so that if there is any linear relationship with the target, the ohe will preserve it.
If you are building tree based models, like random forests, then you could use the [2, 2, 4, 3] as numerical feature, because these models are non-linear.
If building a regression model and not wanting to expand the feature space with ohe, you could treat the bins as a categorical variable, and encode that variable using mean / target encoding, or encoding with digits by following the target mean per bin.
More details about the last 2 procedures in this article.
Disclaimer: I wrote the article.
This question already has answers here:
Does the SVM in sklearn support incremental (online) learning?
(6 answers)
Closed 4 years ago.
I'm trying to perform sentiment analysis over the twitter dataset "Sentiment140" which consists of 1.6 million labelled tweets . I'm constructing my feature vector using Bag Of Words ( Unigram ) model , so each tweet is represented by about 20000 features . Now to train my sklearn model (SVM,Logistic Regression,Naive Bayes) using this dataset , i have to load the entire 1.6m x 20000 feature vectors into one variable and then feed it to the model . Even on my server machine which has a total of 115GB of memory , it causes the process to be killed .
So i wanted to know if i can train the model instance by instance , rather than loading the entire dataset into one variable ?
If sklearn does not have this flexibility , then is there any other libraries that you could recommend (which support sequential learning) ?
It is not really necessary (let alone efficient) to go to the other extreme and train instance by instance; what you are looking for is actually called incremental or online learning, and it is available in scikit-learn's SGDClassifier for linear SVM and logistic regression, which indeed contains a partial_fit method.
Here is a quick example with dummy data:
import numpy as np
from sklearn import linear_model
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
Y = np.array([1, 1, 2, 2])
clf = linear_model.SGDClassifier(max_iter=1000, tol=1e-3)
clf.partial_fit(X, Y, classes=np.unique(Y))
X_new = np.array([[-1, -1], [2, 0], [0, 1], [1, 1]])
Y_new = np.array([1, 1, 2, 1])
clf.partial_fit(X_new, Y_new)
The default values for the loss and penalty arguments ('hinge' and 'l2' respectively) are these of a LinearSVC, so the above code essentially fits incrementally a linear SVM classifier with L2 regularization; these settings can of course be changed - check the docs for more details.
It is necessary to include the classes argument in the first call, which should contain all the existing classes in your problem (even though some of them might not be present in some of the partial fits); it can be omitted in subsequent calls of partial_fit - again, see the linked documentation for more details.
I have tried using the OneVsRest with Logistic Regression from Sklearn, but it gives empty labels for some samples (i.e. doesn't predict any out), even though I do not have any unlabelled training data.
Any idea what might be causing this or how to fix this?
clf = OneVsRestClassifier(LogisticRegression(multi_class='ovr',max_iter=1000,solver='lbfgs'))
clf.fit(X,Y)
self.classifier=clf
self.classifier.predict(test_data)
Whenever you are performing MultiLabel classification, according to the OneVsRestClassifier the targets need to be "a sequence of sequences of labels".
Moreover, depending on how you encode this labels you may get the following warning: "DeprecationWarning: Direct support for sequence of sequences multilabel representation will be unavailable from version 0.17. Use sklearn.preprocessing.MultiLabelBinarizer to convert to a label indicator representation."
So, neat way to encode your labels:
from sklearn import preprocessing
mlb = preprocessing.MultiLabelBinarizer()
Y = mlb.fit_transform([(1, 2), (1,2), (1,2),(4,)])
# this means sample one belongs to classes {1,2} and so on.
# Take into account the format if only one class is needed, (4,) not (4)
so Y turns out to be:
array([[1, 1, 0],
[1, 1, 0],
[1, 1, 0],
[0, 0, 1]])