Why concatenate features in machine learning? - machine-learning

I am learning Microsoft ML framework and confused why features need to be concatenated. In Iris flower example from Microsoft here:
https://learn.microsoft.com/en-us/dotnet/machine-learning/tutorials/iris-clustering
... features are concatenated:
string featuresColumnName = "Features";
var pipeline = mlContext.Transforms
.Concatenate(featuresColumnName, "SepalLength", "SepalWidth", "PetalLength", "PetalWidth")
...
Are multiple features treated as a single feature in order to do calculations like linear regression? If so, how is this accurate? What is happening behind the scenes?

According to the official documentation,
concatenation is necessary because trainers take feature vectors as
inputs.
It essentially transforms the features in the form of separate columns into a single column of feature vectors. Feature values themselves remain intact; only their format and type is changed. It is more clear through this example:
Before transformation:
var samples = new List<InputData>()
{
new InputData(){ Feature1 = 0.1f, Feature2 = new[]{ 1.1f, 2.1f,
3.1f }, Feature3 = 1 },
new InputData(){ Feature1 = 0.2f, Feature2 = new[]{ 1.2f, 2.2f,
3.2f }, Feature3 = 2 },
new InputData(){ Feature1 = 0.3f, Feature2 = new[]{ 1.3f, 2.3f,
3.3f }, Feature3 = 3 },
new InputData(){ Feature1 = 0.4f, Feature2 = new[]{ 1.4f, 2.4f,
3.4f }, Feature3 = 4 },
new InputData(){ Feature1 = 0.5f, Feature2 = new[]{ 1.5f, 2.5f,
3.5f }, Feature3 = 5 },
new InputData(){ Feature1 = 0.6f, Feature2 = new[]{ 1.6f, 2.6f,
3.6f }, Feature3 = 6 },
};
After:
// "Features" column obtained post-transformation.
// 0.1 1.1 2.1 3.1 1
// 0.2 1.2 2.2 3.2 2
// 0.3 1.3 2.3 3.3 3
// 0.4 1.4 2.4 3.4 4
// 0.5 1.5 2.5 3.5 5
// 0.6 1.6 2.6 3.6 6

Related

Forecasting using mutiple seasonal STL and arima

I am attempting to forecast half hourly electricity data. The method I am using is to decompose the electricity consumption data using 'mstl' from the 'Forecast' package by Rob Hyndman and then forecast the seasonally adjusted data using ARIMA.
df <- IntervalData %>% select(CONSUMPTION_MW)
length_test_set = 17520
h = 17520
# create msts object with daily, weekly and monthly seasonality
data_msts <- msts(df, seasonal.periods=c(48,48*7,365/12*48))
train_msts = msts(df[1:(nrow(df)-length_test_set),],seasonal.periods=c(48,48*7,365/12*48))
test_msts = msts(df[((nrow(df)-length_test_set)+1):(nrow(df)),],seasonal.periods=c(48,48*7,365/12*48))
fit_mstl = mstl(train_msts, iterate = 4, s.window = 19, robust = TRUE)
fcast_arima=forecast(fit_mstl,method='arima',h=h)
How do I specify the order of my ARIMA model eg. ARIMA(2,1,6)?
You will need to write your own forecast function like this (using fake data so it can be reproduced).
library(forecast)
df <- data.frame(y=rnorm(50000))
length_test_set <- 17520
h <- 17520
# create msts object with daily, weekly and monthly seasonality
data_msts <- msts(df, seasonal.periods = c(48, 48*7, 365/12*48))
train_msts <- msts(df[1:(nrow(df) - length_test_set), ], seasonal.periods = c(48, 48 * 7, 365 / 12 * 48))
test_msts <- msts(df[((nrow(df) - length_test_set) + 1):(nrow(df)), ], seasonal.periods = c(48, 48 * 7, 365 / 12 * 48))
fit_mstl <- mstl(train_msts, iterate = 4, s.window = 19, robust = TRUE)
# Function to fit specific ARIMA model and return forecasts
arima_forecast <- function(x, h, level, order, ...) {
fit <- Arima(x, order=order, seasonal = c(0,0,0), ...)
return(forecast(fit, h = h, level = level))
}
# Example using an ARIMA(3,0,0) model
fcast_arima <- forecast(fit_mstl, forecastfunction=arima_forecast, h = h, order=c(3,0,0))
Created on 2020-07-25 by the reprex package (v0.3.0)

Training Loss Isnt Decreasing Across Epochs

I am trying to construct a siamese neural network to take in two facial expressions and output the probability that the two images are similar. I have 5 people, with 10 expressions per person, so 50 total images, but with Siamese, I can generate 2500 pairs (with repetition). I have already run dlib's facial landmark detection on each of the 50 images, so each of the two inputs to the siamese net are two flattened 136,1 element arrays. The siamese structure is below:
input_shape = (136,)
left_input = Input(input_shape, name = 'left')
right_input = Input(input_shape, name = 'right')
convnet = Sequential()
convnet.add(Dense(50,activation="relu"))
encoded_l = convnet(left_input)
encoded_r = convnet(right_input)
L1_layer = Lambda(lambda tensors:K.abs(tensors[0] - tensors[1]))
L1_distance = L1_layer([encoded_l, encoded_r])
prediction = Dense(1,activation='relu')(L1_distance)
siamese_net = Model(inputs=[left_input,right_input],outputs=prediction)
optimizer = Adam()
siamese_net.compile(loss="binary_crossentropy",optimizer=optimizer)
I have an array called x_train, which is 80% of all possible labels and whose elements are a list of lists. Data is a 5x10x64x2 matrix, where there are 5 ppl, 10 expressions per, 64 facial landmarks, and 2 positons (x,y) per landmark
x_train = [ [ [Person, Expression] , [Person, Expression] ], ...]
data = np.load('data.npy')
My train loop goes as follows:
def train(data, labels, network, epochs, batch_size):
track_loss = defaultdict(list)
for i in range(0, epochs):
iterations = len(labels)//batch_size
remain = len(labels)%batch_size
shuffle(labels)
print('Epoch%s-----' %(i + 1))
for j in range(0, iterations):
batch = [j*batch_size, j*batch_size + batch_size]
if(j == iterations - 1):
batch[1] += remain
mini_batch = np.zeros(shape = (batch[1] - batch[0], 2, 136))
for k in range(batch[0], batch[1]):
prepx = data[labels[k][0][0],labels[k][0][1],:,:]
prepy = data[labels[k][1][0],labels[k][1][1],:,:]
mini_batch[k - batch[0]][0] = prepx.flatten()
mini_batch[k - batch[0]][1] = prepy.flatten()
targets = np.array([1 if(labels[i][0][1] == labels[i][1][1]) else 0 for i in range(batch[0], batch[1])])
new_batch = mini_batch.reshape(batch[1] - batch[0], 2, 136, 1)
new_targets = targets.reshape(batch[1] - batch[0], 1)
#print(mini_batch.shape, targets.shape)
loss=siamese_net.train_on_batch(
{
'left': mini_batch[:, 0, :],
'right': mini_batch[:, 1, :]
},targets)
track_loss['Epoch%s'%(i)].append(loss)
return network, track_loss
siamese_net, track_loss = train(data, x_train,siamese_net, 20, 30)
The value of each element in the target array is either a 0 or 1, depending on whether the two expressions inputted into the net are different or the same.
Although, I have seen in the omniglot example there were far more images and test images, my neural net doesn't have any decrease in loss.
EDIT:
Here is the new loss with the fixed target tensor:
Epoch1-----Loss: 1.979214
Epoch2-----Loss: 1.631347
Epoch3-----Loss: 1.628090
Epoch4-----Loss: 1.634603
Epoch5-----Loss: 1.621578
Epoch6-----Loss: 1.631347
Epoch7-----Loss: 1.631347
Epoch8-----Loss: 1.631347
Epoch9-----Loss: 1.621578
Epoch10-----Loss: 1.634603
Epoch11-----Loss: 1.634603
Epoch12-----Loss: 1.621578
Epoch13-----Loss: 1.628090
Epoch14-----Loss: 1.624834
Epoch15-----Loss: 1.631347
Epoch16-----Loss: 1.634603
Epoch17-----Loss: 1.628090
Epoch18-----Loss: 1.631347
Epoch19-----Loss: 1.624834
Epoch20-----Loss: 1.624834
I want to know how to improve my architecture and training procedure, or even data prep, in order to improve the performance of the neural net. I assumed that using dlib's facial landmark detection would simplify complexity of the neural net, but I am starting to doubt that hypothesis.

Prediction with FanChenLinSupportVectorRegression

I want to use FanChenLinSupportVectorRegression in accord.net. The predictions are correct for the learning inputs but the model doesn't work for other inputs. I don't understand my mistake?
In the below example, the first prediction is good, however if we want to predict a configuration not learned, the prediction is always the same regardless of the inputs:
// Declare a very simple regression problem
// with only 2 input variables (x and y):
double[][] inputs =
{
new[] { 3.0, 1.0 },
new[] { 7.0, 1.0 },
new[] { 3.0, 1.0 },
new[] { 3.0, 2.0 },
new[] { 6.0, 1.0 },
};
// The task is to output a weighted sum of those numbers
// plus an independent constant term: 7.4x + 1.1y + 42
double[] outputs =
{
7.4*3.0 + 1.1*1.0 + 42.0,
7.4*7.0 + 1.1*1.0 + 42.0,
7.4*3.0 + 1.1*1.0 + 42.0,
7.4*3.0 + 1.1*2.0 + 42.0,
7.4*6.0 + 1.1*1.0 + 42.0,
};
// Create a LibSVM-based support vector regression algorithm
var teacher = new FanChenLinSupportVectorRegression<Gaussian>()
{
Tolerance = 1e-5,
// UseKernelEstimation = true,
// UseComplexityHeuristic = true
Complexity = 10000,
Kernel = new Gaussian(0.1)
};
// Use the algorithm to learn the machine
var svm = teacher.Learn(inputs, outputs);
// Get machine's predictions for inputs
double[] prediction = svm.Score(inputs);
// It's OK the predictions are correct
double[][] inputs1 =
{
new[] { 2.0, 2.0 },
new[] { 5.0, 1.0 },
};
prediction = svm.Score(inputs1);
// predictions are wrong! what is my mistake?

Sensitivity derived from Scikit_Learn Confusion Matrix and Scikit_Learn Recall_Score doesn't match

true = [1,0,0,1]
predict = [1,1,1,1]
cf = sk.metrics.confusion_matrix(true,predict)
print cf
array
([[0, 2],
[0, 2]])
tp = cf[0][0]
fn = cf[0][1]
fp = cf[1][0]
tn = cf[1][1]
sensitivity= tp/(tp+fn)
print(sensitivity)
0.0
print(sk.metrics.recall_score(true, predict))
1.0
As per Scikit documentation "Recall_Score" definition has to match.
Can somebody explain bit more about this?
Confusion matrix labels must be updated in following way:
tn = cf[0][0]
fp = cf[0][1]
fn = cf[1][0]
tp = cf[1][1]
sensitivity= tp/(tp+fn)
print(sensitivity)
1.0

Classifying new instance with bayesian net

Say I have the following bayesian network:
And I want to classify a new instance on wether H=true or H=false,
the new instance looks e.g. like this: Fl=true, A=false, S=true, and Ti=false.
How can I classify the instance with respect to H?
I can compute the probability by multiplying the probabilities from the tables:
0.4 * 0.7 * 0.5 * 0.2 = 0.028
What does this say about whether the new instance is a positive instance H or not?
EDIT
I will try the compute the probability according to Bernhard Kausler's suggestion:
So this is Bayes' rule:
P(H|S,Ti,Fi,A) = P(H,S,Ti,Fi,A) / P(S,Ti,Fi,A)
to compute de denominator:
P(S,Ti,Fi,A) = P(H=T,S,Ti,Fi,A)+P(H=F,S,Ti,Fi,A) = (0.7 * 0.5 * 0.8 * 0.4 * 0.3) + (0.3 * 0.5 * 0.8 * 0.4 * 0.3) =0.048
P(H,S,Ti,Fi,A) = 0.336
so P(H|S,Ti,Fi,A) = 0.0336 / 0.048 = 0.7
now i compute P(H=false|S,Ti,Fi,A) = P(H=false,S,Ti,Fi,A) / P(S,Ti,Fi,A)
we already have the value for P(S,Ti,Fi,A´. I's ´0.048.
P(H=false,S,Ti,Fi,A) =0.0144
so P(H=false|S,Ti,Fi,A) = 0.0144 / 0.048 = 0.3
the Probability for P(H=true,S,Ti,Fi,A) is the highest. so the new instance will be classified as H=True
Is this correct?
Addition: We do not need to calculate P(H=false|S,Ti,Fi,A) because it is 1 - P(H=true|S,Ti,Fi,A).
So, you want to compute the conditional probability P(H|S,Ti,Fi,A). To do that, you have to use Bayes' rule:
P(H|S,Ti,Fi,A) = P(H,S,Ti,Fi,A) / P(S,Ti,Fi,A)
where
P(S,Ti,Fi,A) = P(H=T,S,Ti,Fi,A)+P(H=F,S,Ti,Fi,A)
You then calculate both conditional probabilities P(H=T|S,Ti,Fi,A) and P(H=F|S,Ti,Fi,A) and make a prediction according to which probability is higher.
Just multiplying up the numbers like you did won't help and doesn't even give you a proper probability since the product is not normalized.

Resources