Different number of inputs causes sematic problem in NN during training?

Different number of inputs causes sematic problem in NN during training? - machine-learning

I would like to forecast stock price. Sometimes I have 3 input data, sometimes 5 or 7. ( Sometimes there are 3 limit orders around price change, sometimes 5 ).
I would like to train a NN to predict a number. The number will be predicted based on 3-5 inputs. The problem is that sometimes I have 3 inputs, sometimes 4 or 5.
Lets say sometime:
Input:
[ 0.1, 0.3, 0.5, 0.7, 0.1 ]
Output:
0.7
And another time:
Input:
[ 0.4, 0.1, 0.6, ? , ? ]
Output:
0.5
What should I write at the question mark? null or 0 or undefined?
I try to use the very basic NN network of brain.js and I know my model is not optimised for forecasting but please ignore this now.
I think if I would use 0 when I don't have enough input, then it would cause semantic problem because the NN would think that the 0 input has an actual meaning regarding the input data and will handle it as an actual valid data.
Edit:
Is it possibble to mix the type of input?
For example I could have an extra input field which would be boolean type and it would tell that the next input element should be treat or not.
If I would have 5 information to predict, then I would make a 7 element input array with 2 extra input field, which tell that 0.5 and 0.7 should be treaten as numbers:
[0.1 , 0.4 , 0.1 , BOOLEAN-YES , 0.5 , BOOLEAN-YES , 0.7 ];
If I would have 3 information, then I would make also a 7 element input with also 2 extra input field, which tell that 0 and 0 should be ignore and NOT treat as numbers:
[0.1 , 0.4 , 0.1 , BOOLEAN-NO , 0 , BOOLEAN-NO , 0 ];
With this method I could manage that every input would be same size. Could it be done? How could I do this?
Boolean-no and Boolean-yes could be simply 0 and 1?

Related

How to remove the outliers using Python

I am doing a binary classification problem, I am struggling with removing outliers and also increasing accuracy.
Ratings are one my feature looks like this:
0 0.027465
1 0.027465
2 0.027465
3 0.027465
4 0.027465
...
26043 0.027465
26044 0.027465
26045 0.102234
26046 0.027465
26047 0.027465
mean value of the data:
train.ratings.mean()
0.03871552285960927
std of the data:
train.ratings.std()
0.07585168664836195
I tried the log transformation but accuracy is not increased:
train['ratings']=np.log(train.ratings+1)
my goal is to classify the data true or false:
train.netgain
0 False
1 False
2 False
3 False
4 True
...
26043 True
26044 False
26045 True
26046 False
26047 Fals

One method I used was to calculate a MAD and after that I tag all outlier with a bool type with that I can get all outliers.
Sample of MAD calculation:
def mad(x):
return np.median(np.abs(x - np.median(x)))
def mad_ratio(x):
mad_value = mad(x)
if mad_value == 0:
return 0
x_mad = np.abs(x - np.median(x)) / mad_value
return x_mad

Assume that the rating feature is normally distributed and convert it to the standard normal distribution
From normal distribution, we know 99.7% values are covered with 3 standard deviations. so we can remove the values which are above 3 standard deviations away from the mean.
.**
See below for python code.
ratings_mean=train['ratings'].mean() #Finding the mean of ratings column
ratings_std=train['ratings'].std() # standard deviation of the column
train['ratings']=train['ratings'].map(lamdba x: (x - ratings_mean)/ ratings_std
Ok, now we have now converted our data into a standard normal distribution. Now we if you see, its mean should be 0 and the standard deviation should be 1. From this, we can find out which are greater than 3 and less than -3. so that we can remove those rows from the dataset.
train=train[np.abs(train_ratings) < 3]
Now train dataframe will remove the outliers from the dataset.
**Note: You can apply 2 standard deviations as well because 2-std contains 95% of the data. Its all depends on the domain knowledge and your data. **

if (freq) x$counts else x$density length > 1 and only the first element will be used

for my thesis I have to calculate the number of workers at risk of substitution by machines. I have calculated the probability of substitution (X) and the number of employee at risk (Y) for each occupation category. I have a dataset like this:
X Y
1 0.1300 0
2 0.1000 0
3 0.0841 1513
4 0.0221 287
5 0.1175 3641
....
700 0.9875 4000
I tried to plot a histogram with this command:
hist(dataset1$X,dataset1$Y,xlim=c(0,1),ylim=c(0,30000),breaks=100,main="Distribution",xlab="Probability",ylab="Number of employee")
But I get this error:
In if (freq) x$counts else x$density
length > 1 and only the first element will be used
Can someone tell me what is the problem and write me the right command?
Thank you!

It is worth pointing out that the message displayed is a Warning message, and should not prevent the results being plotted. However, it does indicate there are some issues with the data.
Without the full dataset, it is not 100% obvious what may be the problem. I believe it is caused by the data not being in the correct format, with two potential issues. Firstly, some values have a value of 0, and these won't be plotted on the histogram. Secondly, the observations appear to be inconsistently spaced.
Histograms are best built from one of two datasets:
A dataframe which has been aggregated grouped into consistently sized bins.
A list of values X which in the data
I prefer the second technique. As originally shown here The expandRows() function in the package splitstackshape can be used to repeat the number of rows in the dataframe by the number of observations:
set.seed(123)
dataset1 <- data.frame(X = runif(900, 0, 1), Y = runif(900, 0, 1000))
library(splitstackshape)
dataset2 <- expandRows(dataset1, "Y")
hist(dataset2$X, xlim=c(0,1))
dataset1$bins <- cut(dataset1$X, breaks = seq(0,1,0.01), labels = FALSE)

Weighted Moving Avarage in Gretl

I have a question on gretl and how I can compute the filter of moving avarage.
I have a time series and I want to calculate the weighted moving avarage centered in 5 with these weights: 0.15, 0.2, 0.3, 0.2, 0.15.
In the main page of gretl we have the Variabile window where I can select Filter but there's no option for what I want to do, only, for example, simple moving avarage.
In R I would do something like this:
c<-as.vector()
for (in in 3:(T-2)){
c<-rbind(c, 0.15*x[i-2]+0.2*x[i-1]+0.3*x[i]+0.2*x[i+1]+0.15*x[i+2]}
where x is my time seriee and T is the number of observations.
But my questions are:
Does it exist an user-friendly way to do it in gretl?
If not, what is the best way to do it in the console? Does it exist a specific function?

Well I don't know what exactly you call user friendly, but since you want to have those specific weights, I guess there's no way around typing in some numbers, right?
So if I understand you correctly, and given your series x (in a dataset which is declared and recognized as a time series), then you simply would need to type the formula:
series weighma = 0.15 * x(+2) + 0.2 * x(+1) + 0.3 * x + 0.2 * x(-1) + 0.15 * x(-2)
(Instead of 'series' you could also type in 'genr' or just omit it, but I recommend this explicit variant. The same goes for the + signs inside the parentheses to indicate leads instead of lags.)
The name 'weighma' is of course arbitrary.
There are at least two places where you could type in that formula: Either choose Add /Define new variable from the menus, which gives you a dialog window with a formula field, or open the gretl console (or a script editor window).
A solution which would perhaps be more flexible in a script could use a gretl list of variables and the 'lincomb' function, something like this:
maxlead = 2
matrix weights = {0.15, 0.2, 0.3, 0.2, 0.15}
list xx = lags( nelem(weights), x(maxlead + 1) )
series weighma = lincomb(xx, weights)
The correct maxlead value could also be inferred from the length of the weights vector under the assumption of a centered MA, but I leave it at that.

Handling features not correlated with output prediction?

I do regression analysis with multiple features. Number of features is 20-23. For now, I check each feature correlation with output variable. Some features show correlation coefficient close to 1 or -1 (highly correlated). Some features show correlation coefficient near 0. My question is: do I have to remove this feature if it has close to 0 correlation coefficient? Or I can keep it and the only problem is that this feature will no make some noticeable effect to regression model or will have faint affect on it. Or removing that kind of features is obligatory?

In short
High (absolute) correlation between a feature and output implies that this feature should be valuable as predictor
Lack of correlation between feature and output implies nothing
More details
Pair-wise correlation only shows you how one thing affects the other, it says completely nothing about how good is this feature connected with others. So if your model is not trivial then you should not drop variables because they are not correlated with output). I will give you the example which should show you why.
Consider following sample, we have 2 features (X, Y), and one output value (Z, say red is 1, black is 0)
X Y Z
1 1 1
1 2 0
1 3 0
2 1 0
2 2 1
2 3 0
3 1 0
3 2 0
3 3 1
Let us compute the correlations:
CORREL(X, Z) = 0
CORREL(Y, Z) = 0
So... we should drop all values? One of them? If we drop any variable - our prolem becomes completely impossible to model! "magic" lies in the fact that there is actually a "hidden" relation in the data.
|X-Y|
0
1
2
1
0
1
2
1
0
And
CORREL(|X-Y|, Z) = -0.8528028654
Now this is a good predictor!
You can actually get a perfect regressor (interpolator) through
Z = 1 - sign(|X-Y|)

Issue in training hidden markov model and usage for classification

I am having a tough time in figuring out how to use Kevin Murphy's
HMM toolbox Toolbox. It would be a great help if anyone who has an experience with it could clarify some conceptual questions. I have somehow understood the theory behind HMM but it's confusing how to actually implement it and mention all the parameter setting.
There are 2 classes so we need 2 HMMs.
Let say the training vectors are :class1 O1={ 4 3 5 1 2} and class O_2={ 1 4 3 2 4}.
Now,the system has to classify an unknown sequence O3={1 3 2 4 4} as either class1 or class2.
What is going to go in obsmat0 and obsmat1?
How to specify/syntax for the transition probability transmat0 and transmat1?
what is the variable data going to be in this case?
Would number of states Q=5 since there are five unique numbers/symbols used?
Number of output symbols=5 ?
How do I mention the transition probabilities transmat0 and transmat1?

Instead of answering each individual question, let me illustrate how to use the HMM toolbox with an example -- the weather example which is usually used when introducing hidden markov models.
Basically the states of the model are the three possible types of weather: sunny, rainy and foggy. At any given day, we assume the weather can be only one of these values. Thus the set of HMM states are:
S = {sunny, rainy, foggy}
However in this example, we can't observe the weather directly (apparently we are locked in the basement!). Instead the only evidence we have is whether the person who checks on you every day is carrying an umbrella or not. In HMM terminology, these are the discrete observations:
x = {umbrella, no umbrella}
The HMM model is characterized by three things:
The prior probabilities: vector of probabilities of being in the first state of a sequence.
The transition prob: matrix describing the probabilities of going from one state of weather to another.
The emission prob: matrix describing the probabilities of observing an output (umbrella or not) given a state (weather).
Next we are either given the these probabilities, or we have to learn them from a training set. Once that's done, we can do reasoning like computing likelihood of an observation sequence with respect to an HMM model (or a bunch of models, and pick the most likely one)...
1) known model parameters
Here is a sample code that shows how to fill existing probabilities to build the model:
Q = 3; %# number of states (sun,rain,fog)
O = 2; %# number of discrete observations (umbrella, no umbrella)
%# prior probabilities
prior = [1 0 0];
%# state transition matrix (1: sun, 2: rain, 3:fog)
A = [0.8 0.05 0.15; 0.2 0.6 0.2; 0.2 0.3 0.5];
%# observation emission matrix (1: umbrella, 2: no umbrella)
B = [0.1 0.9; 0.8 0.2; 0.3 0.7];
Then we can sample a bunch of sequences from this model:
num = 20; %# 20 sequences
T = 10; %# each of length 10 (days)
[seqs,states] = dhmm_sample(prior, A, B, num, T);
for example, the 5th example was:
>> seqs(5,:) %# observation sequence
ans =
2 2 1 2 1 1 1 2 2 2
>> states(5,:) %# hidden states sequence
ans =
1 1 1 3 2 2 2 1 1 1
we can evaluate the log-likelihood of the sequence:
dhmm_logprob(seqs(5,:), prior, A, B)
dhmm_logprob_path(prior, A, B, states(5,:))
or compute the Viterbi path (most probable state sequence):
vPath = viterbi_path(prior, A, multinomial_prob(seqs(5,:),B))
2) unknown model parameters
Training is performed using the EM algorithm, and is best done with a set of observation sequences.
Continuing on the same example, we can use the generated data above to train a new model and compare it to the original:
%# we start with a randomly initialized model
prior_hat = normalise(rand(Q,1));
A_hat = mk_stochastic(rand(Q,Q));
B_hat = mk_stochastic(rand(Q,O));
%# learn from data by performing many iterations of EM
[LL,prior_hat,A_hat,B_hat] = dhmm_em(seqs, prior_hat,A_hat,B_hat, 'max_iter',50);
%# plot learning curve
plot(LL), xlabel('iterations'), ylabel('log likelihood'), grid on
Keep in mind that the states order don't have to match. That's why we need to permute the states before comparing the two models. In this example, the trained model looks close to the original one:
>> p = [2 3 1]; %# states permutation
>> prior, prior_hat(p)
prior =
1 0 0
ans =
0.97401
7.5499e-005
0.02591
>> A, A_hat(p,p)
A =
0.8 0.05 0.15
0.2 0.6 0.2
0.2 0.3 0.5
ans =
0.75967 0.05898 0.18135
0.037482 0.77118 0.19134
0.22003 0.53381 0.24616
>> B, B_hat(p,[1 2])
B =
0.1 0.9
0.8 0.2
0.3 0.7
ans =
0.11237 0.88763
0.72839 0.27161
0.25889 0.74111
There are more things you can do with hidden markov models such as classification or pattern recognition. You would have different sets of obervation sequences belonging to different classes. You start by training a model for each set. Then given a new observation sequence, you could classify it by computing its likelihood with respect to each model, and predict the model with the highest log-likelihood.
argmax[ log P(X|model_i) ] over all model_i

I do not use the toolbox that you mention, but I do use HTK. There is a book that describes the function of HTK very clearly, available for free
http://htk.eng.cam.ac.uk/docs/docs.shtml
The introductory chapters might help you understanding.
I can have a quick attempt at answering #4 on your list. . .
The number of emitting states is linked to the length and complexity of your feature vectors. However, it certainly does not have to equal the length of the array of feature vectors, as each emitting state can have a transition probability of going back into itself or even back to a previous state depending on the architecture. I'm also not sure if the value that you give includes the non-emitting states at the start and the end of the hmm, but these need to be considered also. Choosing the number of states often comes down to trial and error.
Good luck!

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Different number of inputs causes sematic problem in NN during training? - machine-learning

Related

How to remove the outliers using Python

if (freq) x$counts else x$density length > 1 and only the first element will be used

Weighted Moving Avarage in Gretl

Handling features not correlated with output prediction?

Issue in training hidden markov model and usage for classification

Categories

Resources