calculation of performance metrics in the multi-class classification

calculation of performance metrics in the multi-class classification - machine-learning

I am using XGBoost classifier that classify X-ray images into 3 classes.
My problem is that when I calculate these values manually (by hand) using the confusion matrix, it shows me values that are not as they are in the classification report. Even though I used all the equations to calculate those.
Please I need a help on how I can make a calculation by hand to find these values (accuracy, precision and recall).
here is the classification report
precision recall f1-score support
0 1.0000 0.9052 0.9502 116
1 0.8267 0.9180 0.8700 317
2 0.9627 0.9357 0.9490 855
accuracy 0.9286 1288
macro avg 0.9298 0.9196 0.9231 1288
weighted avg 0.9326 0.9286 0.9297 1288
and this is the confusion matrix
[0.90 0.05 0.04
0 0.91 0.08
0 0.06 0.93]

Accuracy
How many of the correct predictions are made in total? (Closer to 1)
TP plus TN, divided by the sum of all
Recall
In a sample that is actually positive, the proportion of samples that are determined to be positive
How many of the total things I'm trying to get right? (Closer to 1)
Precision
If it is predicted to be positive, moderate positive. How accurate the positive prediction is
How many correct answers are correct among the questions you solved? (Closer to 1 is better)
Okay lets do 3 x 3 confusion matrix
class A precision = 15 / 24 = 0.625
class B precision = 15 / 20 = 0.75
class C precision = 45 / 56 = 0.80
class A recall = 15 / 20 = 0.75
class B recall = 15 / 30 = 0.5
class C recall = 45 / 50 = 0.9
Accuracy of classifier = (15 + 15 + 45) / 100 = 0.75
Weighted Average Precision = Actual class A instances * precison of class A + Actual class B instances * precison of class B + Actual class C instances * precison of class C
= 20 / 100 * 0.625 + 30 / 100 * 0.75 + 50 / 100 * 0.8
= 0.75
Weighted Average Recall = Actual class A instances * Recall of class A + Actual class B instances * Recall of class B + Actual class C instances * Recall of class C
= 20 / 100 * 0.75 + 30 / 100 * 0.5 + 50 / 100 * 0.9
= 0.75
In your case
class A precision = 0.9 / 0.9 = 1
class B precision = 0.91 / 1.02 = 0.89
class C precision = 0.93 / 1.05 = 0.89
class A recall = 0.9 / 0.99 = 0.91
class B recall = 0.91 / 0.99 = 0.92
class C recall = 0.93 / 0.99 = 0.94
Accuracy of classifier = (0.9 + 0.91 + 0.93) / 2.97 = 0.92
Weighted Average Precision = Actual class A instances * precison of class A + Actual class B instances * precison of class B + Actual class C instances * precison of class C
= 0.99 / 2.97 * 1 + 0.99 / 2.97 * 0.89 + 0.99 / 2.97 * 0.89 = 0.93
Weighted Average Recall = Actual class A instances * Recall of class A + Actual class B instances * Recall of class B + Actual class C instances * Recall of class C
= 0.99 / 2.97 * 0.91 + 0.99 / 2.97 * 0.92 + 0.99 / 2.97 * 0.94 = 0.92

Related

Problem using GEKKO to do regime change detection (estimating time-varying variables)

Using GEKKO Python, we have trouble trying to learn a parameter that can vary multiple times per day. In some disciplines this is also known as 'regime detection or regime change detection'. We (me and my colleague Henri ter Hofte from Windesheim University of Applied Sciences) conceived 3 strategies but fail (more below).
Our question(s):
What are we doing wrong, is there an obvious error in our GEKKO code (more below in the details)?
Is strategy I doomed to fail, and should we switch to strategy II or II?
Is GEKKO Python even suitable for doing this kind of regime (change) detection?
Your help is much appreciated.
=== The problem:
We have time series data about:
(1) CO₂ concentration
(2) ventilation rates (or rather: valve fractions, which give ventilation rates, when multiplied with known maximum ventilation rates)
(3) occupancy (number of persons in a room)
For research question (A) we would like to know a proper estimate for (2) for each hour of the day, given time series data about (1) and (3).
For research question (B) we would like to know a proper estimate for (3) for each hour of the day, given time series data about (1) and (2).
We focus on research question A (but have similar questions for B).
=== The 3 strategies:
We considered 3 different strategies how to implement this using GEKKO Python:
Strategy I. Declare the variable valve_frac as a Manipulated Variabe in our GEKKO model (m.MV), since the GEKKO documentation says these variables can be "adjusted by the optimizer to minimize an objective function at every time point". and "Manipulated variables are like FVs, but can change with each data row, either calculated by the optimizer (STATUS=1) or specified by the user (STATUS=0)." according to https://gekko.readthedocs.io/en/latest/imode.html#mv
Strategy II. Split the time into several shorter time spans (e.g. one time span per hour) and then learn valve_frac as a GEKKO Fixed Variable (m.FV), one for each hour.
Strategy III. Reframe the problem to GEKKO as if it were a control problem: the setpoint is reaching a particular CO2 concentration and GEKKO has can use valve_frac as a Control Variable (m.CV)
We tried implementing strategy I (see more info and code below) but fail to get proper results.
Considering some equation derived from physics, we intend to find the best value for some specific variable (valve_frac__0 variable in following table. Having a dataframe (df_learn) like this:
Index
Date-Time
occupancy__p
valve_frac__0
co2__ppm
1
2022.12.01 – 00:00:00
0
0.51
546
2
2022.12.01 – 00:15:00
4
0.85
820
3
2022.12.01 – 00:30:00
1
0.21
595
4
2022.12.01 – 00:45:00
2
0.74
635
5
2022.12.01 – 00:15:00
0
0.65
559
6
2022.12.01 – 00:15:00
0
0.45
538
7
2022.12.01 – 00:15:00
2
0.82
659
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1920
2022.12.20 – 00:15:00
3
0.73
749
We are trying to develop a moving horizon estimation model (IMODE=5) or Control model (IMODE=6) to predict the valve_frac__0 value.
Following comes the code in Gekko format:
=== Code:
from gekko import GEKKO
# Gekko Model - Initialize
m = GEKKO(remote = False)
m.time = np.arange(0, duration__s, step__s)
# Conversion factors
s_min_1 = 60
min_h_1 = 60
s_h_1 = s_min_1 * min_h_1
mL_m_3 = 1e3 * 1e3
million = 1e6
# Constants
MET__mL_min_1_kg_1_p_1 = 3.5
desk_work__MET = 1.5
P_std__Pa = 101325
R__m3_Pa_K_1_mol_1 = 8.3145
T_room__degC = 20.0
T_std__degC = 0.0
T_zero__K = 273.15
T_std__K = T_zero__K + T_std__degC
T_room__K = T_zero__K + T_room__degC
infilt__m2 = 0.001
# Approximations
room__mol_m_3 = P_std__Pa / (R__m3_Pa_K_1_mol_1 * T_room__K)
std__mol_m_3 = P_std__Pa / (R__m3_Pa_K_1_mol_1 * T_std__K)
co2_ext__ppm = 415
# National averages
weight__kg = 77.5
MET__m3_s_1_p_1 = MET__mL_min_1_kg_1_p_1 * weight__kg / (s_min_1 * mL_m_3)
MET_mol_s_1_p_1 = MET__m3_s_1_p_1 * std__mol_m_3
co2_o2 = 0.894
co2__mol0_p_1_s_1 = co2_o2 * desk_work__MET * MET_mol_s_1_p_1
# Room averages
wind__m_s_1 = 3.0
# GEKKO Manipulated Variables: measured values
occupancy__p = m.MV(value = df_learn.occupancy__p.values)
occupancy__p.STATUS = 0; occupancy__p.FSTATUS = 1
# Strategy I:
valve_frac__0 = m.MV(value = df_learn.valve_frac__0.values)
valve_frac__0.STATUS = 1; valve_frac__0.FSTATUS = 0
# Strategy II:
#valve_frac__0 = m.FV(value = df_learn.valve_frac__0.values)
#valve_frac__0.STATUS = 1; valve_frac__0.FSTATUS = 0
# GEKKO Control Varibale (predicted variable)
co2__ppm = m.CV(value = df_learn.co2__ppm.values)
co2__ppm.STATUS = 1; co2__ppm.FSTATUS = 1
# GEKKO - Equations
co2_loss__ppm_s_1 = m.Intermediate((co2__ppm - co2_ext__ppm) * (vent_max__m3_s_1 * valve_frac__0 + wind__m_s_1 * infilt__m2) / room__m3)
co2_gain_mol0_s_1 = m.Intermediate(occupancy__p * co2__mol0_p_1_s_1 / (room__m3 * room__mol_m_3))
co2_gain__ppm_s_1 = m.Intermediate(co2_gain_mol0_s_1 * million)
m.Equation(co2__ppm.dt() == co2_gain__ppm_s_1 - co2_loss__ppm_s_1)
# GEKKO - Solver setting
m.options.IMODE = 5
m.options.EV_TYPE = 1
m.options.NODES = 2
m.solve(disp = False)
The result which I got for each scenario come as follow:
Strategy I:
There is no output for simulated “co2__ppm” and the output value for
valve_frac__0 = 0
Strategy II:
There is big difference between simulated and measured “co2__ppm” and the output value for
valve_frac__0 = 0.166 (which is not reasonable)

The code looks like it should work as long as valve_frac__0 is the adjustable unknown parameter that should be estimated from the CO2 PPM data. Here is a result on a smaller subset of the posted data.
The data doesn't fit exactly if there is a lower bound of zero on the valve position.
valve_frac__0 = m.MV(value = valve_frac__0,lb=0)
Otherwise, the valve position can be adjusted to fit the CO2 data perfectly.
Here is a complete script with the sample data.
from gekko import GEKKO
import numpy as np
# Gekko Model - Initialize
m = GEKKO(remote = False)
# data
# 1 2022.12.01 – 00:00:00 0 0.51 546
# 2 2022.12.01 – 00:15:00 4 0.85 820
# 3 2022.12.01 – 00:30:00 1 0.21 595
# 4 2022.12.01 – 00:45:00 2 0.74 635
# 5 2022.12.01 – 00:15:00 0 0.65 559
# 6 2022.12.01 – 00:15:00 0 0.45 538
# 7 2022.12.01 – 00:15:00 2 0.82 659
occupancy__p = np.array([0,4,1,2,0,0,2])
valve_frac__0 = np.array([0.51,0.85,0.21,0.74,0.65,0.45,0.82])
co2__ppm_meas = np.array([546,820,595,635,559,538,659])
duration__s = len(co2__ppm_meas)
m.time = np.linspace(0,duration__s-1,duration__s)
vent_max__m3_s_1 = 1
room__m3 = 1
# Conversion factors
s_min_1 = 60
min_h_1 = 60
s_h_1 = s_min_1 * min_h_1
mL_m_3 = 1e3 * 1e3
million = 1e6
# Constants
MET__mL_min_1_kg_1_p_1 = 3.5
desk_work__MET = 1.5
P_std__Pa = 101325
R__m3_Pa_K_1_mol_1 = 8.3145
T_room__degC = 20.0
T_std__degC = 0.0
T_zero__K = 273.15
T_std__K = T_zero__K + T_std__degC
T_room__K = T_zero__K + T_room__degC
infilt__m2 = 0.001
# Approximations
room__mol_m_3 = P_std__Pa / (R__m3_Pa_K_1_mol_1 * T_room__K)
std__mol_m_3 = P_std__Pa / (R__m3_Pa_K_1_mol_1 * T_std__K)
co2_ext__ppm = 415
# National averages
weight__kg = 77.5
MET__m3_s_1_p_1 = MET__mL_min_1_kg_1_p_1 \
* weight__kg / (s_min_1 * mL_m_3)
MET_mol_s_1_p_1 = MET__m3_s_1_p_1 * std__mol_m_3
co2_o2 = 0.894
co2__mol0_p_1_s_1 = co2_o2 * desk_work__MET * MET_mol_s_1_p_1
# Room averages
wind__m_s_1 = 3.0
# GEKKO Manipulated Variables: measured values
occupancy__p = m.MV(value = occupancy__p)
occupancy__p.STATUS = 0; occupancy__p.FSTATUS = 1
# Strategy I:
valve_frac__0 = m.MV(value = valve_frac__0,lb=0)
valve_frac__0.STATUS = 1; valve_frac__0.FSTATUS = 0
# Strategy II:
#valve_frac__0 = m.FV(value = df_learn.valve_frac__0.values)
#valve_frac__0.STATUS = 1; valve_frac__0.FSTATUS = 0
# GEKKO Control Varibale (predicted variable)
co2__ppm = m.CV(value = co2__ppm_meas)
co2__ppm.STATUS = 1; co2__ppm.FSTATUS = 1
# GEKKO - Equations
co2_loss__ppm_s_1 = m.Intermediate((co2__ppm - co2_ext__ppm) \
* (vent_max__m3_s_1 * valve_frac__0 \
+ wind__m_s_1 * infilt__m2) / room__m3)
co2_gain_mol0_s_1 = m.Intermediate(occupancy__p \
* co2__mol0_p_1_s_1 / (room__m3 * room__mol_m_3))
co2_gain__ppm_s_1 = m.Intermediate(co2_gain_mol0_s_1 * million)
m.Equation(co2__ppm.dt() == co2_gain__ppm_s_1 - co2_loss__ppm_s_1)
# GEKKO - Solver setting
m.options.IMODE = 5
m.options.EV_TYPE = 1
m.options.NODES = 2
m.options.SOLVER = 1
m.solve(disp = True)
import matplotlib.pyplot as plt
plt.subplot(2,1,1)
plt.plot(m.time,valve_frac__0.value,'r-',label='Valve Frac')
plt.legend(); plt.grid(); plt.ylabel('Valve Frac')
plt.subplot(2,1,2)
plt.plot(m.time,co2__ppm_meas,'ko',label='Measured')
plt.plot(m.time,co2__ppm.value,'k--',label='Predicted')
plt.legend(); plt.grid()
plt.xlabel('Time'); plt.ylabel('CO2')
plt.savefig('results.png',dpi=300)
plt.show()
For question B, adjust the code to make the valve position fixed at the measured values and the occupancy determined by the optimizer.
occupancy__p = m.MV(value = occupancy__p)
occupancy__p.STATUS = 1; occupancy__p.FSTATUS = 0
# Strategy I:
valve_frac__0 = m.MV(value = valve_frac__0,lb=0)
valve_frac__0.STATUS = 0; valve_frac__0.FSTATUS = 1
Use occupancy__p.MV_STEP_HOR = 2 or higher decrease the frequency at which the optimized parameter can change (e.g. every 2 hours).

How to find and update levels accordingly based on points?

I am creating a rails application which is like a game. So it has points and levels. For example: to become level one the user has to get atleast 100 points and again for level two the user has to reach level 2 the user has to collect 200 points. The level difference changes after every 10 levels i.e., The difference between each level changes after 10 levels always. By that I mean the difference in points between level one and two is 100 and the difference in points in level 11 and 12 is 150 and so on. There is no upper bound for levels.
Now my question is let's say a user's total points is 3150 and just got updated to 3155. What's the optimal solution to find the current level and update it if needed?
I can get a solution using while loops and again looping inside it which will give a result in O(n^2). I need something better.
I think this code works but I'm not sure if this is the best way to go about it
def get_level(points)
diff = 100
sum = 0
level = -1
current_level = 0
while level.negative?
10.times do |i|
current_level += 1
sum += diff
if points > sum
next
elsif points <= sum
level = current_level
break
end
end
diff += 50
end
puts level
end

I wrote a get_points function (it should not be difficult). Then based on it get_level function in which it was necessary to solve the quadratic equation to find high value, and then calc low.
If you have any questions, let me know.
Check output here.
#!/usr/bin/env python3
import math
def get_points(level):
high = (level + 1) // 10
low = (level + 1) % 10
high_point = 250 * high * high + 750 * high # (3 + high) * high // 2 * 500
low_point = (100 + 50 * high) * low
return low_point + high_point
def get_level(points):
# quadratic equation
a = 250
b = 750
c = -points
d = b * b - 4 * a * c
x = (-b + math.sqrt(d)) / (2 * a)
high = int(x)
remainder = points - (250 * high * high + 750 * high)
low = remainder // (100 + 50 * high)
level = high * 10 + low
return level
def main():
for l in range(0, 40):
print(f'{l:3d} {get_points(l - 1):5d}..{get_points(l) - 1}')
for level, (l, r) in (
(1, (100, 199)),
(2, (200, 299)),
(9, (900, 999)),
(10, (1000, 1149)),
(11, (1150, 1299)),
(19, (2350, 2499)),
(20, (2500, 2699)),
):
for p in range(l, r + 1): # for in [l, r]
assert get_level(p) == level, f'{p} {l}'
if __name__ == '__main__':
main()
Why did you set the value of a=250 and b = 750? Can you explain that to me please?
Let's write out every 10 level and the difference between points:
lvl - pnt (+delta)
10 - 1000 (+1000 = +100 * 10)
20 - 2500 (+1500 = +150 * 10)
30 - 4500 (+2000 = +200 * 10)
40 - 7000 (+2500 = +250 * 10)
Divide by 500 (10 levels * 50 difference changes) and received an arithmetic progression starting at 2:
10 - 2 (+2)
20 - 5 (+3)
30 - 9 (+4)
40 - 14 (+5)
Use arithmetic progression get points formula for level = k * 10 equal to:
sum(x for x in 2..k+1) * 500 =
(2 + k + 1) * k / 2 * 500 =
(3 + k) * k * 250 =
250 * k * k + 750 * k
Now we have points and want to find the maximum high such that point >= 250 * high^2 + 750 * high, i. e. 250 * high^2 + 750 * high - points <= 0. Value a = 250 is positive and branches of the parabola are directed up. Now we find the solution of quadratic equation 250 * high^2 + 750 * high - points = 0 and discard the real part (is high = int(x) in python script).

How to interpret this triangular shape ROC AUC curve?

I have 10+ features and a dozen thousand of cases to train a logistic regression for classifying people's race. First example is French vs non-French, and second example is English vs non-English. The results are as follows:
//////////////////////////////////////////////////////
1= fr
0= non-fr
Class count:
0 69109
1 30891
dtype: int64
Accuracy: 0.95126
Classification report:
precision recall f1-score support
0 0.97 0.96 0.96 34547
1 0.92 0.93 0.92 15453
avg / total 0.95 0.95 0.95 50000
Confusion matrix:
[[33229 1318]
[ 1119 14334]]
AUC= 0.944717975754
//////////////////////////////////////////////////////
1= en
0= non-en
Class count:
0 76125
1 23875
dtype: int64
Accuracy: 0.7675
Classification report:
precision recall f1-score support
0 0.91 0.78 0.84 38245
1 0.50 0.74 0.60 11755
avg / total 0.81 0.77 0.78 50000
Confusion matrix:
[[29677 8568]
[ 3057 8698]]
AUC= 0.757955582999
//////////////////////////////////////////////////////
However, I am getting some very strange looking AUC curves with trianglar shapes instead of jagged round curves. Any explanation as to why I am getting such shape? Any possible mistake I have made?
Codes:
all_dict = []
for i in range(0, len(my_dict)):
temp_dict = dict(my_dict[i].items() + my_dict2[i].items() + my_dict3[i].items() + my_dict4[i].items()
+ my_dict5[i].items() + my_dict6[i].items() + my_dict7[i].items() + my_dict8[i].items()
+ my_dict9[i].items() + my_dict10[i].items() + my_dict11[i].items() + my_dict12[i].items()
+ my_dict13[i].items() + my_dict14[i].items() + my_dict15[i].items() + my_dict16[i].items()
)
all_dict.append(temp_dict)
newX = dv.fit_transform(all_dict)
# Separate the training and testing data sets
half_cut = int(len(df)/2.0)*-1
X_train = newX[:half_cut]
X_test = newX[half_cut:]
y_train = y[:half_cut]
y_test = y[half_cut:]
# Fitting X and y into model, using training data
#$$
lr.fit(X_train, y_train)
# Making predictions using trained data
#$$
y_train_predictions = lr.predict(X_train)
#$$
y_test_predictions = lr.predict(X_test)
#print (y_train_predictions == y_train).sum().astype(float)/(y_train.shape[0])
print 'Accuracy:',(y_test_predictions == y_test).sum().astype(float)/(y_test.shape[0])
print 'Classification report:'
print classification_report(y_test, y_test_predictions)
#print sk_confusion_matrix(y_train, y_train_predictions)
print 'Confusion matrix:'
print sk_confusion_matrix(y_test, y_test_predictions)
#print y_test[1:20]
#print y_test_predictions[1:20]
#print y_test[1:10]
#print np.bincount(y_test)
#print np.bincount(y_test_predictions)
# Find and plot AUC
false_positive_rate, true_positive_rate, thresholds = roc_curve(y_test, y_test_predictions)
roc_auc = auc(false_positive_rate, true_positive_rate)
print 'AUC=',roc_auc
plt.title('Receiver Operating Characteristic')
plt.plot(false_positive_rate, true_positive_rate, 'b', label='AUC = %0.2f'% roc_auc)
plt.legend(loc='lower right')
plt.plot([0,1],[0,1],'r--')
plt.xlim([-0.1,1.2])
plt.ylim([-0.1,1.2])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

You're doing it wrong. According to documentation:
y_score : array, shape = [n_samples]
Target scores, can either be probability estimates of the positive class or confidence values.
Thus at this line:
roc_curve(y_test, y_test_predictions)
You should pass into roc_curve function result of decision_function (or some of two columns from predict_proba result) instead of actual predictions.
Look at these examples http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#example-model-selection-plot-roc-py
http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc_crossval.html#example-model-selection-plot-roc-crossval-py

Extracting sampled Time Points

I have a matlab Curve from which i would like to plot and find Concentration values at 17 different time samples
Following is the curve from which i would like to extract Concentration values at 17 different time points
following are the time points in minutes
t = 0,0.25,0.50,1,1.5,2,3,4,9,14,19,24,29,34,39,44,49. minutes samples
Following is the Function which i have written to plot the above graph
function c_t = output_function_constrainedK2(t, a1, a2, a3,b1,b2,b3,td, tmax,k1,k2,k3)
K_1 = (k1*k2)/(k2+k3);
K_2 = (k1*k3)/(k2+k3);
DV_free= k1/(k2+k3);
c_t = zeros(size(t));
ind = (t > td) & (t < tmax);
c_t(ind)= conv(((t(ind) - td) ./ (tmax - td) * (a1 + a2 + a3)),(K_1*exp(-(k2+k3)*t(ind)+K_2)),'same');
ind = (t >= tmax);
c_t(ind)= conv((a1 * exp(-b1 * (t(ind) - tmax))+ a2 * exp(-b2 * (t(ind) - tmax))) + a3 * exp(-b3 * (t(ind) - tmax)),(K_1*exp(-(k2+k3)*t(ind)+K_2)),'same');
plot(t,c_t);
axis([0 50 0 1400]);
xlabel('Time[mins]');
ylabel('concentration [Mbq]');
title('Model :Constrained K2');
end
If possible, Kindly please suggest me some idea how i could possibly alter the above function so that i can come up with concentration values at 17 different time points stated above
Following are the input values that i have used to come up with the curve
output_function_constrainedK2(0:0.1:50,2501,18500,65000,0.5,0.7,0.3,3,8,0.014,0.051,0.07)

This will give you concentration values at the time points you wanted. You will have to put this inside the output_function_constrainedK2 function so that you can access the variables t and c_t.
T=[0 0.25 0.50 1 1.5 2 3 4 9 14 19 24 29 34 39 44 49];
concentration=interp1(t,c_t,T)

Classifying new instance with bayesian net

Say I have the following bayesian network:
And I want to classify a new instance on wether H=true or H=false,
the new instance looks e.g. like this: Fl=true, A=false, S=true, and Ti=false.
How can I classify the instance with respect to H?
I can compute the probability by multiplying the probabilities from the tables:
0.4 * 0.7 * 0.5 * 0.2 = 0.028
What does this say about whether the new instance is a positive instance H or not?
EDIT
I will try the compute the probability according to Bernhard Kausler's suggestion:
So this is Bayes' rule:
P(H|S,Ti,Fi,A) = P(H,S,Ti,Fi,A) / P(S,Ti,Fi,A)
to compute de denominator:
P(S,Ti,Fi,A) = P(H=T,S,Ti,Fi,A)+P(H=F,S,Ti,Fi,A) = (0.7 * 0.5 * 0.8 * 0.4 * 0.3) + (0.3 * 0.5 * 0.8 * 0.4 * 0.3) =0.048
P(H,S,Ti,Fi,A) = 0.336
so P(H|S,Ti,Fi,A) = 0.0336 / 0.048 = 0.7
now i compute P(H=false|S,Ti,Fi,A) = P(H=false,S,Ti,Fi,A) / P(S,Ti,Fi,A)
we already have the value for P(S,Ti,Fi,A´. I's ´0.048.
P(H=false,S,Ti,Fi,A) =0.0144
so P(H=false|S,Ti,Fi,A) = 0.0144 / 0.048 = 0.3
the Probability for P(H=true,S,Ti,Fi,A) is the highest. so the new instance will be classified as H=True
Is this correct?
Addition: We do not need to calculate P(H=false|S,Ti,Fi,A) because it is 1 - P(H=true|S,Ti,Fi,A).

So, you want to compute the conditional probability P(H|S,Ti,Fi,A). To do that, you have to use Bayes' rule:
P(H|S,Ti,Fi,A) = P(H,S,Ti,Fi,A) / P(S,Ti,Fi,A)
where
P(S,Ti,Fi,A) = P(H=T,S,Ti,Fi,A)+P(H=F,S,Ti,Fi,A)
You then calculate both conditional probabilities P(H=T|S,Ti,Fi,A) and P(H=F|S,Ti,Fi,A) and make a prediction according to which probability is higher.
Just multiplying up the numbers like you did won't help and doesn't even give you a proper probability since the product is not normalized.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

calculation of performance metrics in the multi-class classification - machine-learning

Related

Problem using GEKKO to do regime change detection (estimating time-varying variables)

How to find and update levels accordingly based on points?

How to interpret this triangular shape ROC AUC curve?

Extracting sampled Time Points

Classifying new instance with bayesian net

Categories

Resources