Google Sheets Countif with Arrrayformula - google-sheets

I'm doing some dynamic Monte Carlo simulation in Google Sheets, by utilizing the COUNTIF formula for the simulation. Something is not working the way I thought it would, but I cannot put my finger on. I have two columns that I'm comparing, and I need to count the instances where the value in one column is bigger than the value in the other column. If I do this explicitly by propagating the if comparison formula I obtain the correct result. However, if I do it with
=countif( A4:A, ">" & B4:B )
I do not obtain the correct result. My example is at this sheet, the number in cell C4 is the malfunctioning COUNTIF, which equals 2 in the example, and the number in cell E4 is 5, which is the correct count by propagating the comparison in column F and adding the correct comparisons in E4.
p1 p2 n
0.5 0.51 10
Monte Carlo
0.50 0.60 2 5 0
0.90 0.50 1
0.60 0.30 1
0.50 0.60 0
0.40 0.30 1
0.40 0.50 0
0.60 0.70 0
0.60 0.30 1
0.70 0.50 1
0.10 0.30 0

There are two scenarios with countif:
(1) As a non-array formula, =countif( A4:A, ">" & B4:B ) would give you the same result as =countif( A4:A, ">" & B4 ) i.e. it would count only values of A greater than .60, giving the answer 2.
(2) As an array formula, =sum(countif( A4:A, ">" & B4:B )) would give you a separate result for each value of B (2+5+9+2...) giving the answer 56.
If you wanted to use countif, you would need to do something like this:
=ArrayFormula(countif(A4:A-B4:B,">"&0))

try:
=INDEX(SUM(IF(A4:A>B4:B, 1)))

Related

Table printing a list of lists Common lisp

I wish to print this data in a table with the columns aligned. I tried with Format but the columns were not aligned. Does anyone know how to do it ? Thank you.
(("tiscali" 10000 2.31 0.84 -14700.0 "none")
("atlantia" 50 22.65 22.68 1.5 "none")
("bper-banca" 1000 1.59 2.01 423.0 "none")
("alerion-cleanpower" 30 44.14 36.45 -230.7 "none")
("tesmec" 10000 0.12 0.14 150.0 "none")
("cover-50" 120 8.95 9.6 78.0 "none")
("ovs" 1000 1.71 1.93 217.0 "none")
("credito-emiliano" 200 5.7 6.26 112.0 "none"))
I tried to align the columns wit the ~T directive, no way. Is there a piece of code that prints nicely table data?
Let's break this down.
First, let's give your data a nice name:
(defparameter *data*
'(("tiscali" 10000 2.31 0.84 -14700.0 "none")
("atlantia" 50 22.65 22.68 1.5 "none")
("bper-banca" 1000 1.59 2.01 423.0 "none")
("alerion-cleanpower" 30 44.14 36.45 -230.7 "none")
("tesmec" 10000 0.12 0.14 150.0 "none")
("cover-50" 120 8.95 9.6 78.0 "none")
("ovs" 1000 1.71 1.93 217.0 "none")
("credito-emiliano" 200 5.7 6.26 112.0 "none")))
Now, come up with a way to print each line using format and destructuring-bind. Widths of various fields are hard-coded in.
(defun print-line (line)
(destructuring-bind (a b c d e f) line
(format T "~20a ~5d ~6,2f ~6,2f ~10,2f ~4a~%" a b c d e f)))
Once you know you can print a line, you just need to do that for each line.
(mapcar 'print-line *data*)
Result:
tiscali 10000 2.31 0.84 -14700.00 none
atlantia 50 22.65 22.68 1.50 none
bper-banca 1000 1.59 2.01 423.00 none
alerion-cleanpower 30 44.14 36.45 -230.70 none
tesmec 10000 0.12 0.14 150.00 none
cover-50 120 8.95 9.60 78.00 none
ovs 1000 1.71 1.93 217.00 none
credito-emiliano 200 5.70 6.26 112.00 none
I have something like this in my personal code, that I reproduced here in a simplified way:
(defpackage :tabular (:use :cl))
(in-package :tabular)
I have a function that turns any object into a list of values (a row), here the usage is for a list of values, so it is already in the correct shape.
(defgeneric columnize (object)
(:documentation "Representation of object as a list of fields")
(:method ((o list)) o))
I also define a transpose method that works with lists of various sizes:
(defun transpose (lists)
(when (notany #'null lists)
(cons
(mapcar #'first lists)
(transpose (mapcar #'cdr lists)))))
Here is your data, as defined by Chris:
(defparameter *data*
'(("tiscali" 10000 2.31 0.84 -14700.0 "none")
("atlantia" 50 22.65 22.68 1.5 "none")
("bper-banca" 1000 1.59 2.01 423.0 "none")
("alerion-cleanpower" 30 44.14 36.45 -230.7 "none")
("tesmec" 10000 0.12 0.14 150.0 "none")
("cover-50" 120 8.95 9.6 78.0 "none")
("ovs" 1000 1.71 1.93 217.0 "none")
("credito-emiliano" 200 5.7 6.26 112.0 "none")))
And finally, a function that prints a list of objects in a tabular way.
Basically, I convert all objects to list of values, convert them to string, and compute their size. This gives a matrix of size that I transpose to have a list of sizes for the same column: this is used to compute the width of each column, based on the maximum size of the actual data.
In practice, I allow also the generic function to add indicators like how to justify (left/right), etc.
(defun tabulate (stream objects)
(loop
for n from 0
for o in objects
for row = (mapcar #'princ-to-string (columnize o))
collect row into rows
collect (mapcar #'length row) into row-widths
finally
(flet ((build-format-arguments (max-width row)
(when (> max-width 0)
(list max-width #\space row))))
(loop
with number-width = (ceiling (log n 10))
with col-widths = (transpose row-widths)
with max-col-widths = (mapcar (lambda (s) (reduce #'max s)) col-widths)
for index from 0
for row in rows
for entries = (mapcan #'build-format-arguments max-col-widths row)
do (format stream
"~v,'0d. ~{~v,,,va~^ ~}~%"
number-width index entries)))))
For example:
(fresh-line)
(tabulate *standard-output* *data*)
Gives:
0. tiscali 10000 2.31 0.84 -14700.0 none
1. atlantia 50 22.65 22.68 1.5 none
2. bper-banca 1000 1.59 2.01 423.0 none
3. alerion-cleanpower 30 44.14 36.45 -230.7 none
4. tesmec 10000 0.12 0.14 150.0 none
5. cover-50 120 8.95 9.6 78.0 none
6. ovs 1000 1.71 1.93 217.0 none
7. credito-emiliano 200 5.7 6.26 112.0 none
As you can see there is some adjustments that could be made to format floating points values so that they align on the dot, but this is already quite useful.

How to perform calculation with cumulative sum using ARRAYFORMULA

Is it possible to perform an arbitrary calculation (eg. A2*B2) on a set of rows and obtain the cumulative sum along the way using ARRAYFORMULA? For example, in the following sheet we have numbers (column A), multipliers (column B), the result of multiplying them (column C), and a cumulative tally (column D):
| A B C D E F
-------------------------------------------------------------------------------
1 | number multiplier result cumulative array formula array formula sum?
2 | 3 4 12 12 12
3 | 2 4 8 20 8
4 | 10 1 10 30 10
5 | 7 9 63 93 63
I can use ARRAYFORMULA in cell E2 (specifically, ARRAYFORMULA(A2:A5*B2:B5)) to do the multiplication. Is it possible to use ARRAYFORMULA (or alternative tool) in cell F2 to show the cumulative total?
use:
=ARRAYFORMULA(IF(A2:A="",,MMULT(TRANSPOSE((ROW(A2:A)<=
TRANSPOSE(ROW(A2:A)))*A2:A*B2:B), SIGN(B2:B))))
Calculate the cumulative sum with the SCAN and LAMBDA functions:
=SCAN(0, F5:F, LAMBDA(accumulated_value, cell_value, accumulated_value + cell_value))
This will run faster as it runs with linear complexity (O(N)) compared to the ARRAYFORMULA solution, which runs in quadratic time (O(N**2)).
Where:
0 is the initial value of the cumulative sum
F5:F is the range to sum over
LAMBDA(accumulated_value, cell_value, accumulated_value + cell_value)) is the function that calculates the sum at each cell
Sample File

When doing classification, why do I get different precision for the same testing data?

I am testing a dataset with two labels 'A' and 'B' on a decision tree classifier. I accidentally found out that the model get different precision result on the same testing data. I want to know why.
Here is what I do, I train the model, and test it on
1. the testing set,
2. the data only labelled 'A' in the testing set,
3. and the data only labelled 'B'.
Here is what I got:
for testing dataset
precision recall f1-score support
A 0.94 0.95 0.95 25258
B 0.27 0.22 0.24 1963
for data only labelled 'A' in testing dataset
precision recall f1-score support
A 1.00 0.95 0.98 25258
B 0.00 0.00 0.00 0
for data only labelled 'B' in testing dataset
precision recall f1-score support
A 0.00 0.00 0.00 0
B 1.00 0.22 0.36 1963
The training dataset and model are the same, the data in 2 and 3rd test are also same with those in 1. Why the precision for 'A' and 'B' differ so much? What is the real precision for this model? Thank you very much.
You sound confused, and it is not at all clear why you are interested in metrics where you have completely remove one of the two labels from your evaluation set.
Let's explore the issue with some reproducible dummy data:
from sklearn.metrics import classification_report
import numpy as np
y_true = np.array([0, 1, 0, 1, 1, 0, 0])
y_pred = np.array([0, 0, 1, 1, 0, 0, 1])
target_names = ['A', 'B']
print(classification_report(y_true, y_pred, target_names=target_names))
Result:
precision recall f1-score support
A 0.50 0.50 0.50 4
B 0.33 0.33 0.33 3
avg / total 0.43 0.43 0.43 7
Now, let's keep only class A in our y_true:
indA = np.where(y_true==0)
print(indA)
print(y_true[indA])
print(y_pred[indA])
Result:
(array([0, 2, 5, 6], dtype=int64),)
[0 0 0 0]
[0 1 0 1]
Now, here is the definition of precision from the scikit-learn documentation:
The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.
For class A, a true positive (tp) would be a case where the true class is A (0 in our case), and we have indeed predict A (0); from above, it is apparent that tp=2.
The tricky part is the false positives (fp): they are the cases where we have predicted A (0), where the true label is B (1). But it is apparent here that we cannot have any such cases, since we have (intentionally) removed all the B's from our y_true (why we would want to do such a thing? I don't know, it does not make any sense at all); hence, fp=0 in this (weird) setting. Hence, our precision for class A will be tp / (tp+0) = tp/tp = 1.
Which is the exact same result given by the classification report:
print(classification_report(y_true[indA], y_pred[indA], target_names=target_names))
# result:
precision recall f1-score support
A 1.00 0.50 0.67 4
B 0.00 0.00 0.00 0
avg / total 1.00 0.50 0.67 4
and obviously the case for B is identical.
why the precision is not 1 in case #1 (for both A and B)? The data are the same
No, they are very obviously not the same - the ground truth is altered!
Bottom line: removing classes from your y_true before computing precision etc. does not make any sense at all (i.e. your reported results in case #2 and case #3 are of no practical use whatsoever); but, since for whatever reasons you decide to do so, your reported results are exactly as expected.

Average of Consecutive values with multiple criteria

How can I get an average of consecutive "win & buy". In the case of "win & buy" an average of events would be 1+1+1+1+2 =6 ( win&buy + win&buy + win&buy + win&buy + ( win&buy + win&buy ))divided by number of occuranses, in this case 5 would give us 1.2 .
Another example for "win" an average of events would be 1+1+1+2+4 (consecutive values, win+win+win+win,win+win,win,win,win, because there are 3 single "win" + 2 consecutive "win" and finally 4 consecutive "win" at the bottom) = 9 divided by number of occurrences, in this case 5 would give us 1.8 .
=ArrayFormula(MAX(FREQUENCY(IF((A2:A="Buy")*($B$2:$B="WIN"),ROW($B$2:$B)),IF(not((A2:A="Buy")*($B$2:$B="WIN")),ROW($B$2:$B)))))
=ArrayFormula(MAX(FREQUENCY(IF((A2:A="Buy")*($B$2:$B="WIN"),ROW($B$2:$B)),IF((A2:A<>"Buy")+($B$2:$B<>"WIN"),ROW($B$2:$B)))))
I got the above formulas from #Tom Sharpe for MAX consecutive values and tried to AVG them, but with all the 0's in the calculation, I can't get a correct answer.
Sample sheet included.
AVG WIN & BUY:
=AVERAGE(QUERY(ARRAYFORMULA(FREQUENCY(
IF( (A2:A="BUY")*($B$2:$B="WIN"), ROW($B$2:$B)),
IF(NOT((A2:A="BUY")*($B$2:$B="WIN")), ROW($B$2:$B)))),
"where Col1>0"))
AVG SELL & BUY:
=AVERAGE(QUERY(ARRAYFORMULA(MAX(FREQUENCY(
IF( (A2:A="SELL")*($B$2:$B="WIN"), ROW($B$2:$B)),
IF(NOT((A2:A="SELL")*($B$2:$B="WIN")), ROW($B$2:$B))))),
"where Col1>0"))

Artificial Neural Network Toplogy

I am currently trying to revise for my final year exams and came across this question, I have looked everywhere in my lecture slides for any sort of help and cannot find any. Any help in providing insight in to how to solve this question would be appreciated (I am not just asking for the answer, I need to comprehend the topic). Furthermore, do I assume that all inputs are equal to 1? Do i include 7 inputs in the input layer? Im at a loss as to how to answer.
The question is as follows:
b) Determine, with justification, the simplest type and topology (i.e. number of neurons & layers) of artificial neural network that could learn the data set below.
Click here for picture of the dataset.
If I'm not mistaken, you have two inputs X1, X2, and one target output. For each input consisting, of two numbers X1, X2, the appropriate output ("target") is given.
As a first step, you could sketch the seven data points - just draw the 3 ones and 4 zeroes at the right places on on the square (X1, X2) ∈ [0, 1.05] × [0, 1]. Maybe you remember something similar from the lecture, possibly near a mention of "XOR".
The edit queue is full, so adding data from the linked image here
Pattern X1 X2 Target
1 0.01 -0.1 1
2 0.90 0.09 0
3 0.89 -0.05 0
4 1.05 0.95 1
5 -0.01 0.12 0
6 1.05 0.97 1
7 0.98 0.10 0
It looks like 1 possible solution is X1 >= 1.0 OR X2 <= -0.1
Alternatively, if you round each of X1 and X2, it becomes
Pattern X1 X2 Target
1 0 0 1
2 1 0 0
3 1 0 0
4 1 1 1
5 0 0 0
6 1 1 1
7 1 0 0
Then it IS XOR, and the solution is round(X1) XOR round(X2). In that case you can use 1 activation layer (like round, RELU, sigmoid, linear), 1 hidden layer of 2 neurons and 1 output layer of 1 neuron.
See this stackoverflow post for a detail of how to solve XOR with a neural net.

Resources