Suppose I have this data:
time value
---- ----
0 28
1 27
2 26
3 25
4 26
5 27
I want to get values greater than 25 separated by consecutive points, as follow:
Group1
time value
---- ----
0 28
1 27
2 26
Group2
time value
---- ----
4 26
5 27
Is there anyway to do that with one query or should I post-process the data?
Related
I have the following minimal example data (in reality 100's of groups) in range A1:P9 (same data in range A14:A22):
With Sample A1:AR9:
2
61
219
2
4
2
:
61
219
26
26
26
94
21
33
4
26
26
26
94
2
2
:
154
26
40
19
3
2
21
33
14
1
2
3
:
87
39
54
38
26
32
38
26
32
87
39
54
38
26
23
23
4
6
28
2
154
26
2
2
40
19
14
87
39
54
38
26
32
38
26
32
87
39
54
38
26
1
23
2
23
4
4
3
6
20
28
Or Sample A14:AQ22:
2
61
219
2
:
61
219
4
:
26
26
26
94
2
:
21
33
4
26
26
26
94
2
:
154
26
2
:
40
19
3
2
21
33
14
:
87
39
54
38
26
32
38
26
32
87
39
54
38
26
1
:
23
2
:
23
4
:
3
6
20
2
154
26
2
2
40
19
14
87
39
54
38
26
32
38
26
32
87
39
54
38
26
1
23
2
23
4
4
3
6
20
28
I need the output as shown in range Q1:AR3 or as in range Q14:AQ16.
Basically, at each group delimited/inbetween values in Column A, I would need:
The intemediary adjacent values in Column B to be transposed horizontally
And the adjacent content of Columns C to P (14 Columns, at least) to be "joined" together horizontaly an sequencialy "per group", including the content of the delimiter's row (in Column A).
As a bonus it would be really nice to have the Transposed data followed by a :, and each sub Content of Columns C to P to be also separated by a | (as shown in screenshot Q1:AR3 or Q14:AR16).
(Or if it's more feasible, alternatively, the simpler to read 2nd model as in A14:AQ22).
I have a really hard time putting together a formula to come to the expected result.
All I could think of was:
Transposing Column B's content by getting the rows of the adjacent Cells with values in column A,
Concatenating with the Column letter,
Duplicating it in a new column, and Filtering out the blank intermediary cells,
Then shifting the duplicated column 1 cell up,
Then concatenating within a TRANSPOSE formula to get the range of the groups,
Then finally transposing all the groups from Columns B in a new Colum
(very convoluted but I couldn't find better way).
To get to that input:
=TRANSPOSE(B1:B3)
=TRANSPOSE(B4:B5)
=TRANSPOSE(B7:B9)
That was already a very manual and error prone process, and still I could not successfully think of how to do the remaining content joining of Column C to P in a formula.
I tested the following approach but it's not working and would be very tedious process to fix to go and to implement on large datasets:
=TRANSPOSE(B1:B3)&": "&JOIN( " | " , FILTER(C1:P1, NOT(C2:P2 = "") ))&JOIN( " | " , FILTER(C2:P2, NOT(C2:P2 = "") ))&JOIN( " | " , FILTER(C43:P3, NOT(C3:P3 = "") ))
=TRANSPOSE(B4:B5)&": "&JOIN( " | " , FILTER(C4:P4, NOT(C4:P4 = "") ))&JOIN( " | " , FILTER(C5:P5, NOT(C5:P5 = "") ))
=TRANSPOSE(B6:B9)&": "&JOIN( " | " , FILTER(C6:P6, NOT(C6:P6 = "") ))&JOIN( " | " , FILTER(C7:P7, NOT(C7:P7 = "") ))&JOIN( " | " , FILTER(C8:P8, NOT(C8:P8 = "") ))&JOIN( " | " , FILTER(C8:P8, NOT(C9:P9 = "") ))
What better approach to favor toward the expected result? Preferably with a Formula, or if not possible with a script.
Any help is greatly appreciated.
For Sample 1 try this out:
=LAMBDA(norm,MAP(UNIQUE(norm),LAMBDA(ζ,{TRANSPOSE(FILTER(B1:B9,norm=ζ)),":",SPLIT(BYROW(TRANSPOSE(FILTER(BYROW(C1:P9,LAMBDA(r,TEXTJOIN("ζ",1,r))),norm=ζ)),LAMBDA(rr,TEXTJOIN("γ|γ",1,rr))),"ζγ")})))(SORT(SCAN(,SORT(A1:A9,ROW(A1:A9),),LAMBDA(a,c,IF(c="",a,c))),ROW(A1:A9),))
I'm querying data from different shards and used EXPLAIN to check how many series are being fetched for that particular date range.
> SHOW SHARDS
.
.
658 mydb autogen 658 2019-07-22T00:00:00Z 2019-07-29T00:00:00Z 2020-07-27T00:00:00Z
676 mydb autogen 676 2019-07-29T00:00:00Z 2019-08-05T00:00:00Z 2020-08-03T00:00:00Z
.
.
Executing EXPLAIN for data from shard 658 and it's giving expected result in terms of number of series. SensorId is only tag key and as date range fall into only shard it's giving NUMBER OF SERIES: 1
> EXPLAIN select "kWh" from Reading where (SensorId =~ /^1186$/) AND time >= '2019-07-27 00:00:00' AND time <= '2019-07-28 00:00:00' limit 10;
QUERY PLAN
----------
EXPRESSION: <nil>
AUXILIARY FIELDS: "kWh"::float
NUMBER OF SHARDS: 1
NUMBER OF SERIES: 1
CACHED VALUES: 0
NUMBER OF FILES: 2
NUMBER OF BLOCKS: 4
SIZE OF BLOCKS: 32482
But when I run the same query on date range that falls into shard 676, number of series is 13140 instead of just one.
> EXPLAIN select "kWh" from Reading where (SensorId =~ /^1186$/) AND time >= '2019-07-29 00:00:00' AND time < '2019-07-30 00:00:00';
QUERY PLAN
----------
EXPRESSION: <nil>
AUXILIARY FIELDS: "kWh"::float
NUMBER OF SHARDS: 1
NUMBER OF SERIES: 13140
CACHED VALUES: 0
NUMBER OF FILES: 11426
NUMBER OF BLOCKS: 23561
SIZE OF BLOCKS: 108031642
Environment info:
System info: Linux 4.4.0-1087-aws x86_64
InfluxDB version: InfluxDB v1.7.6 (git: 1.7 01c8dd4)
Update - 1
On checking field cardinality, I observed a spike in RAM.
> SHOW FIELD KEY CARDINALITY
Update - 2
I've rebuilt the indexes, but the cardinality is still high.
Update - 3
I found out that shard has "SensorId" as tag as well as field that causing high cardinality when querying with the "SensorId" filter.
> SELECT COUNT("SensorId") from Reading GROUP BY "SensorId";
name: Reading
tags: SensorId=
time count
---- -----
1970-01-01T00:00:00Z 40
But when I'm checking tag values with key 'SensorId', it's not showing empty string that present in the above query.
> show tag values with key = "SensorId"
name: Reading
key value
--- -----
SensorId 10034
SensorId 10037
SensorId 10038
SensorId 10039
SensorId 10040
SensorId 10041
.
.
.
SensorId 9938
SensorId 9939
SensorId 9941
SensorId 9942
SensorId 9944
SensorId 9949
Update - 4
Inspected data using influx_inspect dumptsm and re-validated that null tag values are present
$ influx_inspect dumptsm -index -filter-key "" /var/lib/influxdb/data/mydb/autogen/235/000008442-000000013.tsm
Index:
Pos Min Time Max Time Ofs Size Key Field
1 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 5 103 Reading 1001
2 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 108 275 Reading 2001
3 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 383 248 Reading 2002
4 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 631 278 Reading 2003
5 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 909 278 Reading 2004
6 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1187 184 Reading 2005
7 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1371 103 Reading 2006
8 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1474 250 Reading 2007
9 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1724 103 Reading 2008
10 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1827 275 Reading 2012
11 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 2102 416 Reading 2101
12 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 2518 103 Reading 2692
13 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 2621 101 Reading SensorId
14 2019-07-29T00:00:05Z 2019-07-29T05:31:07Z 2722 1569 Reading,SensorId=10034 2005
15 2019-07-29T05:31:26Z 2019-07-29T11:03:54Z 4291 1467 Reading,SensorId=10034 2005
16 2019-07-29T11:04:14Z 2019-07-29T17:10:16Z 5758 1785 Reading,SensorId=10034 2005
I am new in using Cognos and I have data for the overall project, and I need to create some kind of table or cross tab may be to count the overall project of each year and how many of them are active, canceled and inactive
I have tried using a cross tab but no success.
ProjectId Status Date
1589 Active 8/29/2018
1566 Inactive 4/17/2018
1042 Cancelled 1/6/2014
1374 Completed 1/20/2015
1543 Completed 8/4/2014
1065 Cancelled 7/15/2014
1397 Completed 10/1/2012
1520 Inactive 4/13/2017
1420 Completed 1/1/2015
1443 Completed 1/1/2015
1048 Cancelled 10/16/2014
1002 Active 2/6/2017
1357 Completed 1/19/2017
1606 Active 11/6/2018
Output should look like this
New Projects Active Cancelled/Terminated/Inactive Carried Forward
2013 32 45 4 11 30
2014 45 75 17 14 44
2015 46 90 25 21 44
2016 30 74 27 10 37
2017 82 119 11 26 82
2018 86 168 29 24 115
2019 23 138 9 4 125
Going with -- project Id, status, Date
The ideal scenario is we have a data item for year. If not, to get the year
extract(year, Date)
Calculation data items: for each count
For example, this is for active
if (status = 'Active')Then(1)Else(0)
For properties
Make sure the aggregation is set to total
Adding the column should give you the count
I have been trying to run Cox PH model on a sample data set of 10k customers (randomly taken from 32 million customer base) for predicting probability of survival in time t (which is month in my case). I am using recurrent event survival analysis using counting process for e-commerce. For this...
1. Observation starting point: right after a customer makes first purchase
2. Start/Stop times: Months of two consecutive purchases (as in the data)
I have a few independent variables as in the sample data below:
id start stop status tenure orders revenue Quantity
A 0 20 0 0 1 $89.0 1
B 0 17 0 0 1 $556.0 2
C 0 17 0 0 1 $900.0 2
D 32 33 0 1679 9 $357.8 9
D 26 32 1 1497 7 $326.8 7
D 23 26 1 1405 4 $142.9 4
D 17 23 1 1219 3 $63.9 3
D 9 17 1 978 2 $50.0 2
D 0 9 1 694 1 $35.0 1
E 0 15 0 28 2 $156.0 2
F 0 15 0 0 1 $348.0 1
F 12 14 0 375 2 $216.8 3
F 0 12 1 0 1 $67.8 2
G 9 15 0 277 2 $419.0 2
G 0 9 1 0 1 $359.0 1
While running cox PH using the following code:
fit10=coxph(Surv(start,stop,status)~orders+tenure+Quantity+revenue,data=test)
I keep getting the following error:
Warning: X matrix deemed to be singular; variable 1 2 3 4
I tried searching for the same error online but the answers I found said this could be because of interacting independent variables, whereas my variables are individual and continuous.
below is my string
local Amount =[[
Customer Details Net Amount
# Seq Name
Amount NTR
1 CDABCDEFGHIJ00564
0,1234
2 CDABCDEFGHIJ00565
0,0361
3 CDABCDEFGHIJ00566
0,0361
4 CDABCDEFGHIJ00567
0,0722
5 CDABCDEFGHIJ00568
0,0000
6 CDABCDEFGHIJ00569
0,0000
7 CDABCDEFGHIJ00570
0,0000
8 CDABCDEFGHIJ00571
0,7091
9 CDABCDEFGHIJ00572
1,4240
10 CDABCDEFGHIJ00573
0,0361
11 CDABCDEFGHIJ00574
0,5790
12 CDABCDEFGHIJ00575
0,4060
13 CDABCDEFGHIJ00576
0,3610
14 CDABCDEFGHIJ00577
0,6859
15 CDABCDEFGHIJ00578
0,2888
16 CDABCDEFGHIJ00579
0,0000
17 CDABCDEFGHIJ00580
0,0000
18 CDABCDEFGHIJ00581
0,0000
19 CDABCDEFGHIJ00582
0,0000
20 CDABCDEFGHIJ00583
0,0000
21 CDABCDEFGHIJ00584
0,0000
22 CDABCDEFGHIJ00585
0,8978
23 CDABCDEFGHIJ00586
0,0000
24 CDABCDEFGHIJ00587
2,3882
25 CDABCDEFGHIJ00588
0,0000
26 CDABCDEFGHIJ00589
2,0216
27 CDABCDEFGHIJ00590
1,7540
28 CDABCDEFGHIJ00591
0,0000
29 CDABCDEFGHIJ00592
0,0722
30 CDABCDEFGHIJ00593
0,0361
31 CDABCDEFGHIJ00594
0,0000
32 CDABCDEFGHIJ00595
0,0000
Total NAT files
11,9269
Direct inquiries to:
]]
by executing the code below
local ptrn = '\n([%d%p]+)\n'
for val1, val2 in string.gmatch(Amount, ptrn) do
print ("val1:=\t" .. (val1 or '').."\tval2:=\t"..(val2 or ''))
end
basically from the above string I want to fetch the last 5 digits of the string which is 00564 in val1 and the amount which is 0,1234 in val2 variable, but all this should in one pattern. This is a record, every record is starting with a number like this is 1 record or row
1 CDABCDEFGHIJ00564
0,1234
and this is 2nd record or row and so on
2 CDABCDEFGHIJ00565
0,0361
plese help....
It seems to me that %d+%s+%a+(%d+)\n%s*([%d,]+) should do the trick: the first %d+ will catch the row number, %s+ to match the white space after. %a+(%d+) will match CDABCDEFGHIJ00592 and capture the digits in the end (no way to specify that you want exactly five digits though). \n%s* will match the newline and any white space on the next line and ([%d,]+) will capture the last number with the comma.