Calculate statistical difference from pivot table - google-sheets

I've built this table:
s_male Values
0 1
hs_name1 AVERAGE of sat_composite STDEV of sat_composite COUNT of s_lasid AVERAGE of sat_composite STDEV of sat_composite COUNT of s_lasid
Hope High School 986 600 639 979 630 579
James High School 837 568 473 830 612 428
Juniper High School 789 525 538 722 577 466
Kennedy High School 531 468 314 523 484 239
King High School 683 540 275 619 569 258
Lincoln High School 842 538 354 933 534 279
Meadowbrook High School 484 517 292 484 507 274
North Falls High School 1056 531 590 1046 547 564
Orange High School 905 597 555 828 619 526
Polk High School 680 569 567 691 568 501
South Falls High School 898 602 488 904 584 461
Upper Hills High School 457 491 349 431 490 248
Washington High School 795 609 482 818 635 401
Grand Total 801 585 5916 796 603 5224
Alos pictured here:
I now want to calculate if the average SAT_composite score for women (s_male=0) is statistically different than for men (s_male=1).
I've been trying to figure this out and I am a little lost. Any help would be greatly appreciated.

What you're looking for is just a regular t-test. In google sheets the syntax is:
=TTEST(B2:B8,C2:C8,2,2)
That will give you the p value associated with the test
Arguments:
Range for male scores
Range for female scores
How many tails? 1 or 2
Type of t-test. This choice is important and depends on whether certain assumptions have been violated (primarily independence and homoscedasticity)
I would agree with Toms suggestion in the comments about going back to the raw data as you'll be able to better model your data (in your example you've already collapsed across schools by calculating means, which loses information)

Related

If number in range was >= 100 and subsequently <100

Anyone got any ideas on how to do this?
I'm trying to build a spreadsheet that helps me monitor the performance of my blog articles. So if the article historically had >=100 visits at any point but subsequently gets <100 at any point I want to know about it.
The formula I've been playing with is:
=IF(((FILTER(C2:G2,C2:G2<>E2))>=100 AND (FILTER(C2:G2,C2:G2<>E2))<100, "Article Failing", ""))
I'm using Filter btw because I need to exclude column E, which is the delta between this month's & last month's numbers.
I know the formula isn't logically right but struggling to think of a way to do it.
Edit:
Here's a link to the spreadsheet with desired output https://docs.google.com/spreadsheets/d/1TeaQ6oUbJDeKxUi8tvvCWXtw0oK9d5IVO60j1UbQCK8/edit?usp=sharing
Here's a table showing the sample data and desired output:
Total users (last 30 days)
Total users (prev 30 days)
Delta - Total users
Total users last 30-60 days
Total users prev 60-90 days
Delta - Total users
Above 100
Article Failing
651
90
-417
772
249
523
Tweak Article
Failing
610
570
40
550
432
118
Tweak Article
OK
436
409
27
328
210
118
Tweak Article
OK
422
288
134
53
288
-235
Tweak Article
OK
95
476
-90
417
477
-60
Below100
Failing
337
179
158
129
182
-53
Tweak Article
OK
305
395
-90
318
343
-25
Tweak Article
OK
304
348
-44
299
253
46
Tweak Article
OK
302
277
25
283
317
-34
Tweak Article
OK
286
252
34
268
281
-13
Tweak Article
OK
213
193
20
221
168
53
Tweak Article
OK
157
138
19
132
166
-34
Tweak Article
OK
150
157
-7
110
68
42
Tweak Article
OK
I've made cells B2 & A6 be failing articles i.e. they were >=100 but have since gone below 100. The end column 'Article Failing' is where I'm trying to create the formula.
Hope that makes things a bit clearer.
This formula will match the desired results you show in the sample spreadsheet:
=if(
(max(A$2:A2) >= 100) * (A2 < 100)
+
(max(B$2:B2) >= 100) * (B2 < 100)
+
(row(B2) = row(B$2)) * (B2 < 100),
"Failing",
"OK"
)

How to extend nonlinear curve beyond supplied data in google sheets

I have a plotted spectral curve in google sheets. All points are real coordinates. As you can see, data is not provided for the slope below 614nm. I would like to extend the slope beyond the supplied data, so that it reaches 0. In a mathematically relevant way to follow the trajectory it was taking from when the slope started. Someone mentioned to me I would have to potentially use a linear regression? I'm not sure what that is. How would I go about extending this slope relevant to it's defined trajectory down to 0 in google sheets?
Here's the data
x-axis:
614
616
618
620
622
624
626
628
630
632
634
636
638
640
642
644
646
648
650
652
654
656
658
660
662
664
666
668
670
672
674
676
678
680
682
684
686
688
690
692
694
696
698
700
702
704
706
708
710
712
714
716
718
720
722
724
726
728
730
y-axis:
0.7101
0.7863
0.8623
0.9345
1.0029
1.069
1.1317
1.1898
1.2424
1.289
1.3303
1.3667
1.3985
1.4261
1.4499
1.47
1.4867
1.5005
1.5118
1.5206
1.5273
1.532
1.5348
1.5359
1.5355
1.5336
1.5305
1.5263
1.5212
1.5151
1.5079
1.4994
1.4892
1.4771
1.4631
1.448
1.4332
1.4197
1.4088
1.4015
1.3965
1.3926
1.388
1.3813
1.3714
1.359
1.345
1.3305
1.3163
1.303
1.2904
1.2781
1.2656
1.2526
1.2387
1.2242
1.2091
1.1937
1.1782
Thanks
I understand that you want The curve should be increased beyond the given data in a mathematically sound fashion until it approaches 0, In what follows, I'm going to show how it's done with the last 2 data points which make the filled data linear it might help, like this: take a look at this Sheet.
We need to
1 - Paste this SEQUENCE function formula in C3 to get the order of input
=SEQUENCE(COUNTA(B3:B),1,1,1)
2 - SORT the the input by pasting this formula in E3.
=SORT(A3:C61,3,0)
3 - In F62 after the last line of the sorted data paste this TREND function that Fits an ideal linear trend using the least squares approach to incomplete data about a linear trend and/or makes additional value predictions..
=TREND(F60:F61,E60:E61,E62:E101)
TREND takes
'known_data_y' set to F60:F61
'[known_data_x]' set to E61,E62 those are the 2 data point
[known_data_x] set to E62:E101, we get it by pasting E62:E101 after the last line of the sorted data in "x-axis:" in output table cell E62
4 - To see the newly genrated data in the red curve we need a new column that start from K62 till the very bottom of the data "y-axis:" in output table cell K62, by pasting this ArrayFormula in K62.
=ArrayFormula(E62:G101)
5 - Add a Serie in tne chart in chart editor > setup > Series > Add Serie.

How to only KEEP almost similar records in cognos

You're right!
This is what we are trying to achieve:
For each OrderID, Keep ONLY those OrderIDs where where their ColumnD has at least 2 SAME consecutive values AND where ColumnC has at least 2 same value in it.
So for example we would be keeping the first 2 rows of OrderID 101
So for example we would be keeping the 3 rows of OrderID 104
So for example we would be keeping the 2 rows of OrderID 305
The rest we don't want to see in the report!
Here is an image that might help explain what we want to achieve.
OrderID
ColumnB
ColumnC
ColumnD
101
159
10
18$
101
132
10
18$
101
147
22
18$
102
111
12
55$
103
130
10
18$
104
123
381
75$
104
456
381
75$
104
789
381
75$
305
555
101
37$
305
652
101
37$

Converting .xyz file coordinates to LibSVM format?

I've run into a bit of a delima. I've been doing some 3D scanning and would like to convert .xyz file attained from the scanning process to LibSVM format.
the .xyz file would look like this:
31 423 578
34 423 582
42 423 621
43 423 650
47 423 668
48 423 677
80 423 670
84 423 589
86 423 602
88 404 553
89 403 583
89 404 664
90 393 673
90 396 563
90 397 607
90 403 624
90 404 666
91 409 517
91 411 579
And LibSVM format is like this:
<label> <index1>:<value1> <index2>:<value2> ...
What is to be considered before going about this process? What exactly would my label and index value(s) be? I'm sure value1 would equal the x coordinate. (Please correct me if I'm wrong).
Any demonstration code to give me a gist of the process would certain be appreciated. But words are great.

gnuplot skips data file "with no valid points"

I've got a datafile with some values:
-55 471 485 500
-50 495 510 524
-40 547 562 576
-30 603 617 632
-20 662 677 691
-10 726 740 754
0 794 807 820
10 865 877 889
20 941 951 962
25 980 990 1000
30 1018 1029 1041
40 1097 1111 1125
50 1180 1196 1213
60 1266 1286 1305
70 1355 1378 1402
80 1447 1475 1502
90 1543 1575 1607
100 1642 1679 1716
110 1745 1786 1828
120 1849 1896 1943
125 1900 1950 2000
130 1950 2003 2056
140 2044 2103 2162
150 2124 2189 2254
When I call the following gnuplot script:
set terminal latex
set output 'foo.tex'
unset key
set format "%g"
set autoscale
set xlabel "Temperatur an $R_1$ [$^{{\degree}C}$]"
set ylabel 'Ladezeit [$ms$]'
f(r) =(log(1/3)*r*(47*(10e-6)))*-1
plot [-55:150] [0:3] '/some/path/res/kty_81-121.dat' using 1:(f($3)) with lines
gnuplot spits out a rather general error warning: Skipping data file with no valid points. After hours of doing research about this problem I have still no answer.
Does someone know how to fix this?
When divide integer with integer, gnuplot automatically cast the output into integer. Thus, the argument of the log function becomes zero (i.e. int(1/3) = 0), and became -inf. Change the function as below.
f(r) =(log(1.0/3.0)*r*(47*(10e-6)))*-1

Resources