Using postgres percentile function with negative numbers - ruby-on-rails

I have a table of containing records with negative numbers:
ID
Location
Temperature
1
Paris
-1
2
London
-2
3
Berlin
-3
4
Moscow
-4
5
Rome
-5
6
Warsaw
-6
7
Madrid
-7
8
Amsterdam
-8
9
Milan
-9
10
Zurich
-10
(my actual records and values are more numerous and more complex, but this should help illustrate the issue)
I want to get the minimum, first quartile, median, third quartile, maximum of the temperature values, but in reverse.
For instance, in my example I would have:
Aggregate
Value
Minimum
-1
First quartile
-2.5
Median
-5
Third quartile
-7.5
Maximum
-10
The problem as I see it is that my numbers are negative. So when I run:
SELECT PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY "city_temperatures"."temperature") AS percentile_temperature FROM "city_temperatures"
I actually get the value third quartile as opposed to the first quartile.
What's the best way to handle negative numbers in a query like this?

Add DESC to ORDER BY?
SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY t.temperature DESC) AS pct_temp
FROM city_temperatures t;
You might get all of it as array in a single calls with:
SELECT percentile_cont('{0,0.25,0.5,0.75,1}'::float8[])
WITHIN GROUP (ORDER BY t.temperature DESC) AS pct_temps
FROM city_temperatures t;

Related

How to fix “#DIV/0!” error from Excel import?

I have a spreadsheet that works properly in Excel. However, when I import it to Google Sheets it gives me the #DIV/)! error. I am at a loss for how to fix this.
I am trying to rank the items based on the number in column P. I would like for the highest number in column P to be ranked 1, then 2, 3, etc. If two numbers in column P are the same I would like for them to both be ranked the same. However, I don't want the formula to then skip the next number in the ranking order. Also, I am not sure if it matters, but column P displays a number but is technically filled with a formula to obtain that number. Example:
Points column is populated using the following formula:
=SUM(H2,J2,L2,N2,O2)
Points Rank
5 3
3 4
8 1
3 4
6 2
2 5
=SUMPRODUCT((P2 < P$2:P$36)/COUNTIF(P$2:P$36,P$2:P$36))+1
Any ideas?
Add the opposite of the numerator to the denominator to ensure you never receive #DIV/0!.
=SUMPRODUCT((P2 < P$2:P$36)/(COUNTIF(P$2:P$36,P$2:P$36)+(P2 >= P$2:P$36)))+1
When (P2 < P$2:P$36) is false, the numerator will be zero so it doesn't matter what the denominator is as long as it isn't zero.

influxdb query basic percentage calculation

I want calculate a division between a number of values different form zero in a specific table and the number of value equal to zero in the same table
SELECT (count("value") WHERE value = 0 / count("value") WHERE value != 0) * 100 FROM "ping_rtt" WHERE time < now() - 15
Obviously this is wrong and I was wondering what could me the correct way to structure the query.
If your field value consists of just zeros or ones; you can easily calculate percentage as:
SELECT 100*sum(value)/count(value) from your_metric
Or simply use Mean function instead of count/sum.
But if value consists of any arbitrary numbers; there is a tricky way (based on this fact that current InfluxDB implementation calculates zero/zero as zero) to achieve this :) You can first map your field value to zeros and ones and then calculate percentage:
SELECT 100*count(map_value)/sum(map_value) FROM (SELECT value/value as map_value FROM your_metric)
It works properly in my influxdb 1.6.0; suppose there is a metric called metric which contain a field val as:
> select * from metric
name: metric
time tag val
---- --- ---
1539780859073434500 15
1539780862064944400 10
1539780865272757400 7
1539780867449546100 0
1539780880145442700 -8
1539781131768616600 12 0
1539781644977103800 12 0.5
1539781649113051900 12 1.5
as you can see, there are different float number as 0,-8,1.5,0.5 and so on.
we can now map our val field to zero or one:
> select val/val as normal_val from metric
name: metric
time normal_val
---- ----------
1539780859073434500 1
1539780862064944400 1
1539780865272757400 1
1539780867449546100 0
1539780880145442700 1
1539781131768616600 0
1539781644977103800 1
1539781649113051900 1

Google Sheets Query Group By / First-N-Per-Group

I'm trying to find a simple solution for first-n-per-group.
I have a table of data, first column dates and rest data. I want to group based around the date, as multiple entries per date are allowed. For the second column some numbers, but want the FIRST record.
Currently the aggregate function I could possibly use is MIN() but that will return the lowest value and not the first.
A B
01/01/2018 10
01/01/2018 15
02/01/2018 10
02/01/2018 2
02/01/2018 100
02/01/2018 20
03/01/2018 5
03/01/2018 2
Desired output
A B
01/01/2018 10
02/01/2018 10
03/01/2018 5
Current results using MIN() - undesired
A B
01/01/2018 10
02/01/2018 2
03/01/2018 2
It's a shame there isn't a FIRST() aggregate function in Google Sheets, which would make this a lot easier.
I saw a couple of examples of using the Row Number and ArrayQuery, but that doesn't seem to work for me. There are about 5000 rows of data so trying to keep this as efficient as possible, and not have to recalculate the entire sheet on any change, each taking a few seconds.
Currently I have this, which appends a third column with the Row Number:
=query({A1:B, arrayformula(row(A1:B))}, "select min(Col1),min(Col2) group by Col1")
Thanks
EDIT 1
A suggested solution was =SORTN(A:B,2^99,2,1,1), which is a clean simple one. However, this requires a large range of "free space" to display the returned dataset. Imagine 3000+ rows.
I was hoping for a QUERY() -based solution, as I wanted to do further operations with the results. Specifically, count the occurrences of distinct values.
For example: I wanted a returned dataset of
A B
01/01/2018 10
02/01/2018 10
03/01/2018 5
Yet I want to count the occurrences of those values (and then ignoring the dates). For example:
B C
10 2
5 1
Perhaps I've confused the situation by using numbers? the "data" in ColB is TEXT (short 3 letter codes), however I used numbers to show I couldn't use MIN() function as that returns the numerically lowest value.
So in brief:
Go through all rows (3000+ rows) and group by the FIRST row of a particular date
return the FIRST value of that row
COUNT() all unique occurrences of those FIRST values, disregarding the date. Just a list with the unique values and their count (again, only the first one of any particular day)
=SORTN(A:B,2^99,2,1,1)
If your data is sorted as in the sample, You can easily remove duplicates with SORTN()

GETPIVOTDATA in Googlesheets doesn't seem to be working when picking a date column (#REF error)

I am trying to get GETPIVOTDATA to work right while using dates. I have looked at multiple questions here on SO that are for GETPIVOTDATA, but none of them use a date in a reference.
I can create a pivot table with the following data and pull out the total for a given division and subdivision. But I can't crack the code to handling dates right in GoogleSheets version of GETPIVOTDATA, even though my code works in MS Excel.
this data comes from the googledocs supportpage: https://support.google.com/docs/answer/6167538?hl=en
division subdivision product number number of units Date price per unit
east 1 1 14 3/1/2018 $10
east 2 1 15 3/1/2018 $11
west 1 1 11 3/3/2018 $10
west 2 1 21 3/4/2018 $9
east 3 1 16 3/1/2018 $8
west 3 1 18 3/6/2018 $12
east 4 1 11 3/7/2018 $9
east 1 2 10 3/1/2018 $9
east 2 2 9 3/9/2018 $13
west 1 2 12 3/10/2018 $10
west 2 2 15 3/1/2018 $10
east 3 2 12 3/12/2018 $9
west 3 2 16 3/1/2018 $12
east 4 2 12 3/14/2018 $9
The pivot table is anchored into H1 and the columns listed are
division, subdivision, Date, SUM of number of units
in cells H1, I1, J1, K1 respectively
23 =GETPIVOTDATA(K1,H1,"division", "east", "subdivision", 4)
#REF! =GETPIVOTDATA(K1,H1,"division", "east", "subdivision", 4, "Date", datevalue("2018-3-07"))
#REF! =GETPIVOTDATA(K1,H1,"division", "east", "subdivision", 4, "Date", DATE(2018, 3, 7))
It should return "11" which is the intersection of east, 4 and 3/7
The #REF errors return with "Field combination not found in pivot table for function GETPIVOTDATA" even though it seems like all of the fields are listed. As you can see, I can get my summary value if I use two division and subdivision, but not when I add the Date field. I have tried multiple ways to match the datevalue in the pivottable.
I am flustered. What silly thing am I missing here? Please check that your answer actually works in GoogleSheets before suggesting it :)
Thanks!
I know that I'm absurdly late, but this was bothering me as well and I could not figure it out.
I finally realized that the GETPIVOTDATA method uses the total rows, and will throw an error if the correct totals are not there.
Hopefully this helps people who find this like I did.
It turns out that the value argument (technically the pivot_item argument) for the date argument (original_column) must be text that matches the format of the date as it is formatted in the SOURCE of the pivot table, i.e. in the data.
So if the date item is formatted as 3/7/2018 in the original data, then, regardless of how you format the date in the pivot table, a formula that works would be:
=GETPIVOTDATA(K1,H1,"division", "east", "subdivision", 4, "Date", "3/7/2018")
If in the data, there is a subsequent reoccurrence of the same value but is formatted differently, e.g. 3/7 (no year), then as far as I can tell, the first occurrence of that value will be used as the reference format. So the formula above would capture all 3/7/2018 data* assuming the first 3/7/2018 data point is formatted as such.
If another date in the data is formatted as 3/8 Thu (first occurrence), then that's the text that needs to be used in the Pivot Item of the formula.
Google's definition of the Pivot Item is:
pivot_item… - [optional] repeatable
The name of the row or column
shown in the pivot table corresponding to original_column that you
want to retrieve.
It says the name of the row or column. Maybe they were very literal about it, but most likely not, considering this function is borrowed from Excel, and Excel uses the value independent of its format.
*subject to the other pivot columns/items (division & subdivision)
Combining answers from bsoo and Amos47 fixed it for me.
Need to make sure you have "Show Totals" ticked on your pivot table
For dates, don't use datevalue("2018-3-07") - instead use the text function to match the format that you have in the pivot table. This can vary based on the spreadsheet location. For example, if you are referencing a cell A4 you might use =GETPIVOTDATA(K1,H1,"division", "east", "subdivision", 4, "Date", text(A4,"yyyy-mm-dd")).
With dates, I find using either the iso 8601 format "yyyymmdd" or the text version of the month "dd-mmm-yy" helps to avoid errors if it switches between US and European format

Negative References or reversing order of column for DATEDIF

I have a ascending sorted list of irregular dates in Column A:A:
A B C D (A:A,A2:A) E (A:A,A3:A)
2017-11-09 10 10 NA NA
2017-11-10 11 21 1 NA
2017-11-14 15 36 4 5
2017-11-15 22 58 1 5
Column C:C is a rolling sum of B:B. I'm trying to get arrayformula in D:D/E:E to find the datedif between current row (starting date) and X rows above (end date):
=ArrayFormula(DATEDIF(B:B-(X Rows),B:B,"D"))
The goal is to find range of change in D:D over X amount of days:
D:D - D:D-rowX / datedif (A:A-rowX, A:A)
i.e for 2 days on row C4:
(C4-C2) / datedif(C4-2,C4,"D")
(58-21) / datedif(C2,C4,"D")
37 / 5 = 7.4
for 5 days on row C10:
(C10-C5) / datedif(C10-5,C10,"D")
for 15 days on row C20:
(C20-C5) / datedif(C20-15,C20,"D")
I'm trying to calculate X for 1,2,3,4,7,28 rows up which means the array has to start that 1,2,3,4,7,28 rows down.
Right now, the array bugs out to bad reference because the first starting date is DATEDIF(B-X,B1,"D") where B-X is a invalid negative reference. Arrayformulas with bad values instead of bad references seems to just skip past errors and starts working once input are valid. But I can't figure out how to skip bad references. I've tried forcing start date with INDIRECT but can't get it to recognize value as a date. I also tried DATEDIF(B:B, B:B+X,"D"), which spits out the correct numbers but results are offset by X rows. I've tried reverse sorting A:A, =ArrayFormula(if(len(A:A),DATEDIF(SORT(A2:A,1,0),SORT(A:A,1,0),"D"),"")) it produces a reverse orders list of correct answers that I can't figure out how to flip back.
Seems like I'm missing something obvious?
EDIT: tried to clarify original post
Is there a easy way to displace an entire column?
Alternative Solution?
The formula roughly works but is not aligned to the correct row:
C D E
1 2 3
1 2 3
1 2 3
1 2
1
I just need it to display
C D E
1
1 2
1 2 3
1 2 3
1 2 3
To get things aligned, I can put in cell on row2 of Column F:
=array_constrain(ARRAYFORMULA(D:D),COUNT(A:A)-2,1)
Or cell in row3 of Column G:
=array_constrain(ARRAYFORMULA(E:E),COUNT(A:A)-3,1)
But if I try trigger teh formula from row1 via:
=arrayformula(if(row(A:A)>=2,array_constrain(D:D,COUNT(A:A)-2,1)))
It label everythign >=2 row false and still render D:D without displacing the cells the proper number of rows:
C D
1 false
1 2
1 2
1 2
1
EDIT: I'm closing the request, ended up just using vlookup(B:B-X) which provided an approximate enough result to work for my needs.
Short answer
Add the following formula to D1
=ArrayFormula({"N/A";ARRAY_CONSTRAIN(DATEDIF(A:A,A2:A,"D"),COUNT(A:A)-1,1)})
And the following formula to E1
=ArrayFormula({"N/A";"N/A";ARRAY_CONSTRAIN(DATEDIF(A:A,A3:A,"D"),COUNT(A:A)-2,1)})
Explanation
The solution use ARRAY_CONSTRAIN to return just the required result values and use a the array notation to add the required N/A values for the rows that as it don't have a pair to calculate the date difference.
REMARK:
Please note that the DATEDIF functions use the column A for the references as this column is the one that holds the date values.

Resources