I have an ongoing scoreboard with a friend for a game we play. It looks like this:
A B C D E F
+-----------------------------+-------+------+--------+--------+------------+
1 | Through the Ages Scoreboard | | | | | |
+-----------------------------+-------+------+--------+--------+------------+
2 | Game title | Kevin | M | First? | Winner | Difference |
+-----------------------------+-------+------+--------+--------+------------+
3 | thekoalaz's Game | 174 | 213 | Kevin | M | 39 |
4 | Game #0 | 242 | 126 | Kevin | Kevin | 116 |
5 | Game #1 | 105 | 146 | Kevin | M | 41 |
6 | Game #2 | 158 | 135 | Kevin | Kevin | 23 |
7 | Game #3 | 149 | 145 | M | Kevin | 4 |
8 | Game #4 | 91 | 145 | Kevin | M | 54 |
9 | Game #5 | 211 | 187 | M | Kevin | 24 |
10 | Game #6 | 160 | 158 | M | Kevin | 2 |
11 | Game #7 | 154 | 215 | Kevin | M | 61 |
12 | Game #8 | 169 | 177 | M | M | 8 |
13 | Game #9 | 135 | 129 | M | Kevin | 6 |
14 | Game #10 | 156 | 262 | Kevin | M | 106 |
15 | Game #11 | 205 | 171 | M | Kevin | 34 |
16 | Game #12 (2) | 186 | 203 | Kevin | M | 17 |
17 | | | | | | |
+-----------------------------+-------+------+--------+--------+------------+
Where there's space at the end of the board to add scores for future games.
How do I count how many times the player who goes first wins? In this case it should be 3: D4 = E4, D6 = E6, D12 = E12. Is this possible to do in a single formula? And I'd like to make adding future game scores "just work" with this as well.
Here, first is {K;K;K;K;M;K;M;M;K;M;M;K;M;K}
And winner is {M;K;M;K;K;M;K;K;M;M;K;M;K;M}
I tried =COUNTIF($E$3:$E, $D$3:$D), but this gives me 7, which I presume is the same as =COUNTIF($E$3:$E, $D$3), without the ranged criteria.
Other ranged criteria questions didn't seem to focus on this 1:1 necessity (or maybe I don't know how to word it).
Here's what I used:
=SUMPRODUCT(D3:D=E3:E, E3:E<>"")
Let's break it down.
D3:D=E3:E (also expressible as EQ(D3:D, E3:E)) - equality. I tried to figure out the concept of testing equality of ranges, but the best thing I could find was Microsoft's tutorial on array formulas. What I can say is if you just put =D3:D=E3:E in your Google sheet, it will just be one of the results--the one that matches the row. It requires =ArrayFormula(D3:D=E3:E) to enter as the array of equality results.
SUMPRODUCT - Sums the product of corresponding array elements between multiple arrays. For example, SUMPRODUCT({1,3}, {2,4}) = 1*2 + 2*4 = 10. If used with one array, it would just aggregate the array's values. TRUE=1 and FALSE=0, so when considering the array formula above, it will count how many times D3:D=E3:E is true. Ranges work as arrays, so maybe that's why wrapping the equality with ArrayFormula(...) isn't necessary
E3:E<>"" - Another array formula testing if the E cell is not empty (<> is the "not equals" sign). Because I want this to automatically work for any new entries, D3:D=E3:E will evaluate true for any empty entries (empty=empty). Mutliplying these two array formulas together is effectively an AND operator--"sum this if Dn=En AND En is not empty". To convince you, here are the truth tables:
+-----+---+---+ +------+---+---+
| AND | T | F | | MULT | 1 | 0 |
+-----+---+---+ +------+---+---+
| T | T | F | | 1 | 1 | 0 |
| F | F | F | | 0 | 0 | 0 |
+-----+---+---+ +------+---+---+
Related
I'm having a hard time getting a regressor to work correctly, using a custom loss function.
I'm currently using several datasets which contain data for transprecision computing benchmark experiments, here's a snippet from one of them:
| var_0 | var_1 | var_2 | var_3 | err_ds_0 | err_ds_1 | err_ds_2 | err_ds_3 | err_ds_4 | err_mean | err_std |
|-------|-------|-------|-------|---------------|---------------|---------------|---------------|---------------|----------------|-------------------|
| 27 | 45 | 35 | 40 | 16.0258634564 | 15.9905086513 | 15.9665402702 | 15.9654006879 | 15.9920739469 | 15.98807740254 | 0.02203520210917 |
| 42 | 23 | 4 | 10 | 0.82257142551 | 0.91889119458 | 0.93573069325 | 0.81276879271 | 0.87065388914 | 0.872123199038 | 0.049423964650445 |
| 7 | 52 | 45 | 4 | 2.39566262913 | 2.4233107563 | 2.45756544291 | 2.37961745294 | 2.42859839621 | 2.416950935498 | 0.027102139332226 |
(Sorry in advance for the markdown table, couldn't find a better way to do this)
Each err_ds_* column is obtained from a different benchmark execution, using the specified var_* configuration (each var contains the number of bits of precision used for a specific variable); each error cell actually contains the negative natural logarithm of the error (since the actual values are really small), and the err_mean and err_std for each row are calculated from these values.
During data preparation for the network, I reshape the dataset, in order to have each benchmark execution as a separate row (which means we're going to have multiple rows with the same var_* values, but a different error value); then I separate data (what we usually give to the fit function as x) and target (what we usually give to the fit function as y), so to obtain, respectively:
| var_0 | var_1 | var_2 | var_3 |
|-------|-------|-------|-------|
| 27 | 45 | 35 | 40 |
| 27 | 45 | 35 | 40 |
| 27 | 45 | 35 | 40 |
| 27 | 45 | 35 | 40 |
| 27 | 45 | 35 | 40 |
| 42 | 23 | 4 | 10 |
| 42 | 23 | 4 | 10 |
| 42 | 23 | 4 | 10 |
| 42 | 23 | 4 | 10 |
| 42 | 23 | 4 | 10 |
| 7 | 52 | 45 | 4 |
| 7 | 52 | 45 | 4 |
| 7 | 52 | 45 | 4 |
| 7 | 52 | 45 | 4 |
| 7 | 52 | 45 | 4 |
and
| log_err |
|---------------|
| 16.0258634564 |
| 15.9905086513 |
| 15.9665402702 |
| 15.9654006879 |
| 15.9654006879 |
| 0.82257142551 |
| 0.91889119458 |
| 0.93573069325 |
| 0.81276879271 |
| 0.87065388914 |
| 2.39566262913 |
| 2.4233107563 |
| 2.45756544291 |
| 2.37961745294 |
| 2.42859839621 |
Finally we split again the set in order to have train data (which we're going to call train_data_regr and train_target_tensor) and test data (which we're going to call test_data_regr and test_target_tensor), all of which are scaled using scaler_regr_*.fit_transform(df) (where scaler_regr.* are StandardScaler() from sklearn.preprocessing), and fed into the network:
n_features = train_data_regr.shape
input_shape = (train_data_regr.shape[1],)
pred_model = Sequential()
# Input layer
pred_model.add(Dense(n_features * 3, activation='relu',
activity_regularizer=regularizers.l1(1e-5), input_shape=input_shape))
# Hidden dense layers
pred_model.add(Dense(n_features * 8, activation='relu',
activity_regularizer=regularizers.l1(1e-5)))
pred_model.add(Dense(n_features * 4, activation='relu',
activity_regularizer=regularizers.l1(1e-5)))
# Output layer (two neurons, one for the mean, one for the std)
pred_model.add(Dense(2, activation='linear'))
# Loss function
def neg_log_likelihood_loss(y_true, y_pred):
sep = y_pred.shape[1] // 2
mu, logvar = y_pred[:, :sep], y_pred[:, sep:]
return K.sum(0.5*(logvar+np.log(2*np.pi)+K.square((y_true-mu)/K.exp(0.5*logvar))), axis=-1)
# Callbacks
early_stopping = EarlyStopping(
monitor='val_loss', patience=10, min_delta=1e-5)
reduce_lr = ReduceLROnPlateau(
monitor='val_loss', patience=5, min_lr=1e-5, factor=0.2)
terminate_nan = TerminateOnNaN()
# Compiling
adam = optimizers.Adam(lr=0.001, decay=0.005)
pred_model.compile(optimizer=adam, loss=neg_log_likelihood_loss)
# Training
history = pred_model.fit(train_data_regr, train_target_tensor,
epochs=20, batch_size=64, shuffle=True,
validation_split=0.1, verbose=True,
callbacks=[early_stopping, reduce_lr, terminate_nan])
predicted = pred_model.predict(test_data_regr)
actual = test_target_regr
actual_rescaled = scaler_regr_target.inverse_transform(actual)
predicted_rescaled = scaler_regr_target.inverse_transform(predicted)
test_data_rescaled = scaler_regr_data.inverse_transform(test_data_regr)
Finally the obtained data is evaluated through a custom function, which compares actual data with predicted data (namely true mean vs predicted mean and true std vs predicted std) with several metrics (like MAE and MSE), and plots the result with matplotlib.
The idea is that the two outputs of the network are going to predict the mean and the std of the error, given a var_* configuration as input.
Now, let's get the question: since with this code I'm getting very good results with the prediction of the mean (even with different benchmarks), but terrible results with the prediction of the std, I wanted to ask if this is the right way to predict the two values. I'm sure I'm missing something very basic here, but after two weeks I think I'm stuck for good.
I have created this Sheets to test this formula:
| | A | B | C | D |
| 1 | Object | Yes | Maybe | No |
| 2 | Object 1 | 50 | 25 | 0 |
| 3 | Object 2 | 20 | 10 | 0 |
| 4 | Object 3 | 20 | 10 | 0 |
| 5 | Object 4 | 10 | 5 | 0 |
Rules
| | A | B | C | D | E | F | G |
| 1 | Article | Object 1 | Object 2 | Object 3 | Object 4 | Total | |
| 2 | Article 1 | 50 | 20 | 20 | 10 | 100 | |
| 3 | Article 2 | Yes | Yes | Yes | Yes | 100 | |
| 4 | Article 3 | Yes | No | No | Yes | 60 | |
| 5 | Article 4 | No | Yes | Yes | No | 40 | |
| 6 | Test | No | Yes | Yes | No | #VALUE! | |
| 7 | Test2 | Yes | Yes | No | Yes | 50 | |
| 8 | Test3 | Yes | Yes | No | Yes | 70 | * |
* This works partially, but if No is selected the next Yes won't be calculated and breaks
if first Object is not Yes. The example says 70 but should be 80.
Sheet
https://docs.google.com/spreadsheets/d/1ydSfa4dpkTdcvwPPqGLRdQ9r-JZstB-hYS7J7tondUs/edit?usp=sharing
What I want to achieve is that the values listed in Rules! should correspond to Yes/No in Sheet! when it adds up the SUM.
For example in Sheet!, if I select Yes, Yes, No, Yes it should add up to 50 + 20 + 0 + 10 = 80. As the first Yes equals 50, followed by 20, 20, 10, and any No equals 0.
I know very basics formulas when it comes to spreadsheets, and what I have tried so far is the following, and it is also where I get stuck.
I want it to read B8 through E8, see how many Yes is listed, and if any Yes is listed, compare it to B2 through B5.
=SUMIF(B8:E8,"Yes",Rules!B2:B5)
The closest I have come is by ignoring the Rules sheet and to put the rules directly into the formula by repeating IF statements. This way it works, but I would still prefer to have the rules set by the Rules sheet.
=IF(B10="Yes",50+IF(C10="Yes",20+IF(D10="Yes",20+IF(E10="Yes",10,0))))
What I try it probably very incorrect, but as I said I have no idea how to proceed or fix it.
Does anyone have a suggestion for me?
Or if you need further explanation of what I want to achieve, if something is not clear, please let me know and I will try to explain.
Maybe you are after SUMPRODUCT:
=sumproduct(B$2:E$2,B3:E3="Yes")
in F3 of Sheet, copied down to suit, or perhaps:
=sumproduct(transpose(Rules!B$2:B$5),B3:E3="Yes")
I'm not able to get accuracy, as every dataset I provide provides 100% accuracy for every classifier algorithm I apply. My data set is of 10 people.
It gives the same accuracy for naive bayes, J48, JRip classifier algorithm.
+----+-------+----+----+----+----+----+-----+----+------+-------+-------+-------+
| id | name | q1 | q2 | q3 | m1 | m2 | tut | fl | proj | fexam | total | grade |
+----+-------+----+----+----+----+----+-----+----+------+-------+-------+-------+
| 1 | abv | 5 | 5 | 5 | 13 | 13 | 4 | 8 | 7 | 40 | 100 | p |
| 2 | ca | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 40 | 48 | f |
| 3 | ga | 4 | 2 | 3 | 5 | 10 | 4 | 5 | 6 | 20 | 59 | f |
| 4 | ui | 5 | 4 | 4 | 12 | 13 | 3 | 7 | 7 | 39 | 94 | p |
| 5 | pa | 4 | 1 | 1 | 4 | 3 | 2 | 4 | 5 | 22 | 46 | f |
| 6 | la | 2 | 3 | 1 | 1 | 2 | 0 | 4 | 2 | 11 | 26 | f |
| 7 | ka | 5 | 4 | 1 | 3 | 3 | 1 | 6 | 4 | 24 | 51 | f |
| 8 | ma | 5 | 3 | 3 | 9 | 8 | 4 | 8 | 0 | 20 | 60 | p |
| 9 | ash | 2 | 5 | 5 | 11 | 12 | 3 | 7 | 6 | 30 | 81 | p |
| 10 | opo | 4 | 2 | 1 | 13 | 1 | 3 | 7 | 3 | 35 | 69 | p |
+----+-------+----+----+----+----+----+-----+----+------+-------+-------+-------+
Make sure to not include any unique identifier column.
Also don't include the total.
Most likely, the classifiers learned that "name" is a good predictor and/or that you need total > 59 points total to pass.
I suggest you even withhold at least one exercise because of that - some classifiers will still learn that the sum of the individual points is necessary to pass.
I assume you want to find out if one part is most indicative of passing, i.e. "if you do well on part 3, you will likely pass". But to answer this question, you need to account for e.g. different amount of points per question, etc. - otherwise, your predictor will just identify which question has the most points...
Also, 10 is a much too small sample size!
You can see from the output that is displayed that the tree that J48 generated used only the variable fl, so I do not think that you have the problem that #Anony-Mousse referred to.
I notice that you are testing on the training set (see the "Test Options" radio buttons at upper left of the GUI). That almost always overestimates the accuracy. What you are seeing is overfitting. Instead, use cross-validation to get a better estimate of the accuracy you could expect on new data. With only 10 data points, you should use either 10 folds or 5.
Try testing your model on cross-validation on "k splits" or Percentage split.
Generally in Percentage Split: Training set is of 2/3 of dataset and Test set is 1/3.
Also, What I feel that your dataset is very small... There are chances of high accuracy in that case.
My application generates logs and sends them to syslog-ng.
I want to write a custom template/parser/filter for use in syslog-ng to correctly store the fields in tables of an SQLite database (MyDatabase).
This is the legend of my log:
unique-record-id usename date Quantity BOQ possible,item,profiles Count Vendor applicable,vendor,categories known,request,types vendor_code credit
All these 12 fields are tab separated, and the parser must store them into 12 columns of table MyTable1 in MyDatabase.
Some of the fields: the 6th, 9th, and 10th however also contain "sub-fields" as comma-separated values.
The number of values within each of these sub-fields, is variable, and can change in each line of log.
I need these fields to be stored in respective separate tables
MyItem_type, MyVendor_groups, MyReqs
These "secondary" tables have 3 columns, record the Unique-Record-ID, and Quantity against each of their occurence in the log
So the schema in MyItem_type table looks like:
Unique-Record-ID | item_profile | Quantity
Similarly the schema of MyVendor_groups looks like:
Unique-Record-ID | vendor_category | Quantity
and the schema of MyReqs looks like:
Unique-Record-ID | req_type | Quantity
Consider these sample lines from the log:
unique-record-id usename date Quantity BOQ possible,item,profiles Count Vendor applicable,vendor,categories known,request,types vendor_code credit
234.44.tfhj Sam 22-03-2016 22 prod1 cat1,cat22,cat36,cat44 66 ven1 t1,t33,t43,t49 req1,req2,req3,req4 blue 64.22
234.45.tfhj Alex 23-03-2016 100 prod2 cat10,cat36,cat42 104 ven1 t22,t45 req1,req2,req33,req5 red 66
234.44.tfhj Vikas 24-03-2016 88 prod1 cat101,cat316,cat43 22 ven2 t22,t43 req1,req23,req3,req6 red 77.12
234.47.tfhj Jane 25-03-2016 22 prod7 cat10,cat36,cat44 43 ven3 t77 req1,req24,req3,req7 green 45.89
234.48.tfhj John 26-03-2016 97 serv3 cat101,cat36,cat45 69 ven5 t1 req11,req2,req3,req8 orange 33.04
234.49.tfhj Ruby 27-03-2016 85 prod58 cat10,cat38,cat46 88 ven9 t33,t55,t99 req1,req24,req3,req9 white 46.04
234.50.tfhj Ahmed 28-03-2016 44 serv7 cat110,cat36,cat47 34 ven11 t22,t43,t77 req1,req20,req3,req10 red 43
My parser should store the above log into MyDatabase.Mytable1 as:
unique-record-id | usename | date | Quantity | BOQ | item_profile | Count | Vendor | vendor_category | req_type | vendor_code | credit
234.44.tfhj | Sam | 22-03-2016 | 22 | prod1 | cat1,cat22,cat36,cat44 | 66 | ven1 | t1,t33,t43,t49 | req1,req2,req3,req4 | blue | 64.22
234.45.tfhj | Alex | 23-03-2016 | 100 | prod2 | cat10,cat36,cat42 | 104 | ven1 | t22,t45 | req1,req2,req33,req5 | red | 66
234.44.tfhj | Vikas | 24-03-2016 | 88 | prod1 | cat101,cat316,cat43 | 22 | ven2 | t22,t43 | req1,req23,req3,req6 | red | 77.12
234.47.tfhj | Jane | 25-03-2016 | 22 | prod7 | cat10,cat36,cat44 | 43 | ven3 | t77 | req1,req24,req3,req7 | green | 45.89
234.48.tfhj | John | 26-03-2016 | 97 | serv3 | cat101,cat36,cat45 | 69 | ven5 | t1 | req11,req2,req3,req8 | orange | 33.04
234.49.tfhj | Ruby | 27-03-2016 | 85 | prod58 | cat10,cat38,cat46 | 88 | ven9 | t33,t55,t99 | req1,req24,req3,req9 | white | 46.04
234.50.tfhj | Ahmed | 28-03-2016 | 44 | serv7 | cat110,cat36,cat47 | 34 | ven11 | t22,t43,t77 | req1,req20,req3,req10 | red | 43
And also parse the "possible,item,profiles" to record into MyDatabase.MyItem_type as:
Unique-Record-ID | item_profile | Quantity
234.44.tfhj | cat1 | 22
234.44.tfhj | cat22 | 22
234.44.tfhj | cat36 | 22
234.44.tfhj | cat44 | 22
234.45.tfhj | cat10 | 100
234.45.tfhj | cat36 | 100
234.45.tfhj | cat42 | 100
234.44.tfhj | cat101 | 88
234.44.tfhj | cat316 | 88
234.44.tfhj | cat43 | 88
234.47.tfhj | cat10 | 22
234.47.tfhj | cat36 | 22
234.47.tfhj | cat44 | 22
234.48.tfhj | cat101 | 97
234.48.tfhj | cat36 | 97
234.48.tfhj | cat45 | 97
234.48.tfhj | cat101 | 97
234.48.tfhj | cat36 | 97
234.48.tfhj | cat45 | 97
234.49.tfhj | cat10 | 85
234.49.tfhj | cat38 | 85
234.49.tfhj | cat46 | 85
234.50.tfhj | cat110 | 44
234.50.tfhj | cat36 | 44
234.50.tfhj | cat47 | 44
We also need to similarly parse "applicable,vendor,categories" and
store them into MyDatabase.MyVendor_groups. And parse
"known,request,types" for storage into MyDatabase.MyReqs The first
column for MyDatabase.MyItem_type, MyDatabase.MyVendor_groups and
MyDatabase.MyReqs will always be the Unique-Record-ID that was
witnessed in the log.
Therefore yes, this column does not contain unique data, like other columns, in these three tables.
The third column will always be the Quantity that was witnessed in the log.
I know a bit of PCRE, but it is the use of nested parsers in syslog-ng that's completely confusing me.
Documentation of Syslog-ng suggests this is possible, but simply failed to get a good example. If any kind hack around here has some reference or sample to share, it will be so useful.
Thanks in advance.
I think all of these can be done using the csv-parser a few times.
First, use a csv-parser with the tab delimiter("\t") to split the initial fields into named columns. Use this parser on the entire message.
Then you'll have to parse the fields that have subfields using other instances of the csv-parser on the columns that need further parsing.
You can find some examples at https://www.balabit.com/sites/default/files/documents/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/csv-parser.html and https://www.balabit.com/sites/default/files/documents/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/reference-parsers-csv.html
(It is possible that you can get it done with a single parser, if you specify both the tab and the comma as delimiters, but it might not work for the fields with variable number of fields.).
INVOICE
So i have to put this into 1NF, 2NF and 3NF
PROD_NUM PROD_LABEL PROD_PRICE
AA-E3422QW ROTARY SANDER 49.95
AA-E3422QW ROTARY SANDER 49.95
QD-300932X 0.25IN. DRILL BIT 3.45
RU-95748G BAND SAW 33.99
GH-778345P POWER DRILL 87.75
VEN_CODE VEN_NAME
211 NEVERFAIL, INC
211 NEVERFAIL, INC
211 NEVERFAIL, INC
309 BEGOOD, INC
157 TOUGHGO, INC
So far i have these as my 2NF. Am i going right? And how do i put the table into 3NF ?
So my 2nf will be like this ?2NF TABLE IMAGE
I think the picture you were given is considered 1NF.
And you initially showed 3NF, but you'll need an additional table to reference which Product is by what Vendor as well as modify the invoice table.
Vendor - Unique list of vendors
VEN_ID | VEN_CODE | VEN_NAME
-------|----------|---------------
1 | 211 | NEVERFAIL, INC
2 | 309 | BEGOOD, INC
3 | 157 | TOUGHGO, INC
Product - Unique list of products
PROD_ID | PROD_NUM | PROD_LABEL | PROD_PRICE
--------|------------|-------------------|-----------
1 | AA-E3422QW | ROTARY SANDER | 49.95
2 | QD-300932X | 0.25IN. DRILL BIT | 3.45
3 | RU-95748G | BAND SAW | 33.99
4 | GH-778345P | POWER DRILL | 87.75
Vendor_Product - the mapping between products and vendors
VEN_ID | PROD_ID
-------|----------
1 | 1
1 | 2
2 | 3
3 | 4
Purchases - The transactions that happened
PURCH_ID | INV_NUM | SALE_DATE | PROD_ID | QUANT_SOLD
---------|---------|-------------|---------|------------
1 | 211347 | 15-JAN-2006 | 1 | 1
2 | 211347 | 15-JAN-2006 | 2 | 8
3 | 211347 | 15-JAN-2006 | 3 | 1
4 | 211348 | 15-JAN-2006 | 1 | 2
5 | 211349 | 16-JAN-2006 | 4 | 1
I think that is good, but it can be split again.
Invoices - A unique list of invoices
INV_ID | INV_NUM | SALE_DATE
--------|---------|-------------
1 | 211347 | 15-JAN-2006
2 | 211348 | 15-JAN-2006
3 | 211349 | 16-JAN-2006
Purchases - The transactions that happened
PURCH_ID | INV_ID | PROD_ID | QUANT_SOLD
---------|--------|---------|---------
1 | 1 | 1 | 1
2 | 1 | 2 | 8
3 | 1 | 3 | 1
4 | 2 | 1 | 2
5 | 3 | 4 | 1
To get 2NF, combine the Vendor information back into the Product table. With these columns
PROD_ID | PROD_NUM | PROD_LABEL | PROD_PRICE | VEN_CODE | VEN_NAME
In this case, the Vendor and Vendor_Product tables aren't needed.