Anyone got any ideas on how to do this?
I'm trying to build a spreadsheet that helps me monitor the performance of my blog articles. So if the article historically had >=100 visits at any point but subsequently gets <100 at any point I want to know about it.
The formula I've been playing with is:
=IF(((FILTER(C2:G2,C2:G2<>E2))>=100 AND (FILTER(C2:G2,C2:G2<>E2))<100, "Article Failing", ""))
I'm using Filter btw because I need to exclude column E, which is the delta between this month's & last month's numbers.
I know the formula isn't logically right but struggling to think of a way to do it.
Edit:
Here's a link to the spreadsheet with desired output https://docs.google.com/spreadsheets/d/1TeaQ6oUbJDeKxUi8tvvCWXtw0oK9d5IVO60j1UbQCK8/edit?usp=sharing
Here's a table showing the sample data and desired output:
Total users (last 30 days)
Total users (prev 30 days)
Delta - Total users
Total users last 30-60 days
Total users prev 60-90 days
Delta - Total users
Above 100
Article Failing
651
90
-417
772
249
523
Tweak Article
Failing
610
570
40
550
432
118
Tweak Article
OK
436
409
27
328
210
118
Tweak Article
OK
422
288
134
53
288
-235
Tweak Article
OK
95
476
-90
417
477
-60
Below100
Failing
337
179
158
129
182
-53
Tweak Article
OK
305
395
-90
318
343
-25
Tweak Article
OK
304
348
-44
299
253
46
Tweak Article
OK
302
277
25
283
317
-34
Tweak Article
OK
286
252
34
268
281
-13
Tweak Article
OK
213
193
20
221
168
53
Tweak Article
OK
157
138
19
132
166
-34
Tweak Article
OK
150
157
-7
110
68
42
Tweak Article
OK
I've made cells B2 & A6 be failing articles i.e. they were >=100 but have since gone below 100. The end column 'Article Failing' is where I'm trying to create the formula.
Hope that makes things a bit clearer.
This formula will match the desired results you show in the sample spreadsheet:
=if(
(max(A$2:A2) >= 100) * (A2 < 100)
+
(max(B$2:B2) >= 100) * (B2 < 100)
+
(row(B2) = row(B$2)) * (B2 < 100),
"Failing",
"OK"
)
Related
You're right!
This is what we are trying to achieve:
For each OrderID, Keep ONLY those OrderIDs where where their ColumnD has at least 2 SAME consecutive values AND where ColumnC has at least 2 same value in it.
So for example we would be keeping the first 2 rows of OrderID 101
So for example we would be keeping the 3 rows of OrderID 104
So for example we would be keeping the 2 rows of OrderID 305
The rest we don't want to see in the report!
Here is an image that might help explain what we want to achieve.
OrderID
ColumnB
ColumnC
ColumnD
101
159
10
18$
101
132
10
18$
101
147
22
18$
102
111
12
55$
103
130
10
18$
104
123
381
75$
104
456
381
75$
104
789
381
75$
305
555
101
37$
305
652
101
37$
I've built this table:
s_male Values
0 1
hs_name1 AVERAGE of sat_composite STDEV of sat_composite COUNT of s_lasid AVERAGE of sat_composite STDEV of sat_composite COUNT of s_lasid
Hope High School 986 600 639 979 630 579
James High School 837 568 473 830 612 428
Juniper High School 789 525 538 722 577 466
Kennedy High School 531 468 314 523 484 239
King High School 683 540 275 619 569 258
Lincoln High School 842 538 354 933 534 279
Meadowbrook High School 484 517 292 484 507 274
North Falls High School 1056 531 590 1046 547 564
Orange High School 905 597 555 828 619 526
Polk High School 680 569 567 691 568 501
South Falls High School 898 602 488 904 584 461
Upper Hills High School 457 491 349 431 490 248
Washington High School 795 609 482 818 635 401
Grand Total 801 585 5916 796 603 5224
Alos pictured here:
I now want to calculate if the average SAT_composite score for women (s_male=0) is statistically different than for men (s_male=1).
I've been trying to figure this out and I am a little lost. Any help would be greatly appreciated.
What you're looking for is just a regular t-test. In google sheets the syntax is:
=TTEST(B2:B8,C2:C8,2,2)
That will give you the p value associated with the test
Arguments:
Range for male scores
Range for female scores
How many tails? 1 or 2
Type of t-test. This choice is important and depends on whether certain assumptions have been violated (primarily independence and homoscedasticity)
I would agree with Toms suggestion in the comments about going back to the raw data as you'll be able to better model your data (in your example you've already collapsed across schools by calculating means, which loses information)
I try to implement a simple ARMA model, however have serious difficulties getting it to run. When adding a parameter to the error term everything works fine (see the return x_m1 + a*e statement, commented out below), however if I add a parameter to the auto regressive part, I get a FloatingPointError or LinAlgError or PositiveDefiniteError, depending on the initialization method I use.
The code is also put into a gist you can find here. The model definition is replicated here:
with pm.Model() as model:
a = pm.Normal("a", 0, 1)
sigma = pm.Exponential('sigma', 0.1, testval=F(.1))
e = pm.Normal("e", 0, sigma, shape=(N-1,))
def x(e, x_m1, a):
# return x_m1 + a*e
return a*x_m1 + e
x, updates = theano.scan(
fn=x,
sequences=[e],
outputs_info=[tt.as_tensor_variable(data.iloc[0])],
non_sequences=[a]
)
x = pm.Deterministic('x', x)
lam = pm.Exponential('lambda', 5.0, testval=F(.1))
y = pm.StudentT("y", mu=x, lam=lam, nu=1, observed=data.values[1:]) #
with model:
trace = pm.sample(2000, init="NUTS", n_init=1000)
Here the errors respective to the initialization methods:
"ADVI" / "ADVI_MAP": FloatingPointError: NaN occurred in ADVI optimization.
"MAP": LinAlgError: 35-th leading minor not positive definite
"NUTS": PositiveDefiniteError: Scaling is not positive definite. Simple check failed. Diagonal contains negatives. Check indexes [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71]
For details on the error messages, please look at this github issue posted at pymc3.
To be explicit, I really would like to have a scan-like solution which is easily extendable to for instance a full ARMA model. I know that one can represent the presented AR(1) model without scan by defining logP as already done in pymc3/distributions/timeseries.py#L18-L46, however I was not able to extend this vectorized style to a full ARMA model. The use of theano.scan seems preferable I think.
Any help is highly appriciated!
I've run into a bit of a delima. I've been doing some 3D scanning and would like to convert .xyz file attained from the scanning process to LibSVM format.
the .xyz file would look like this:
31 423 578
34 423 582
42 423 621
43 423 650
47 423 668
48 423 677
80 423 670
84 423 589
86 423 602
88 404 553
89 403 583
89 404 664
90 393 673
90 396 563
90 397 607
90 403 624
90 404 666
91 409 517
91 411 579
And LibSVM format is like this:
<label> <index1>:<value1> <index2>:<value2> ...
What is to be considered before going about this process? What exactly would my label and index value(s) be? I'm sure value1 would equal the x coordinate. (Please correct me if I'm wrong).
Any demonstration code to give me a gist of the process would certain be appreciated. But words are great.
This program insists that 35 is a prime number even though, going through it step-by-step, the program should reach the point where it calculates 35%5 and then ignore the number (because the result is 0.) I haven't checked every single number but it seems to display only primes otherwise (except for numbers that are anologous to 35 like 135.)
print ('How many prime numbers do you require?')
primes = io.read("*n")
print ('Here you go:')
num,denom,num_primes=2,2,0
while num_primes<primes do
if denom<num then
if num%denom==0 then
num=num+1
else
denom=denom+1
end
else
print(num)
num=num+1
num_primes=num_primes+1
denom=2
end
end
Sample output:
How many prime numbers do you require?
50
Here you go:
2
3
5
7
11
13
17
19
23
27
29
31
35
37
41
43
47
53
59
61
67
71
73
79
83
87
89
95
97
101
103
107
109
113
119
123
127
131
135
137
139
143
147
149
151
157
163
167
173
179
You aren't resetting denom in the % case.
if num%denom==0 then
num=num+1
else
So when you fall-through this test you start testing the next number starting from the previous denominator instead of from 2 again.
Simple debugging print lines in the loop printing out denom and num would have shown this to you (as, in fact, that's exactly how I found it). You only need to three prime numbers output to see the issue.
Fixed it, set denom=2 after num=num+1
print ('How many prime numbers do you require?')
primes = io.read("*n")
print ('Here you go:')
num,denom,num_primes=2,2,0
while num_primes<primes do
if denom<num then
if num%denom==0 then
num=num+1
denom=2
else
denom=denom+1
end
else
print(num)
num=num+1
num_primes=num_primes+1
denom=2
end
end