Remove rows that are near duplicate of previous row in Deedle dataframe

Remove rows that are near duplicate of previous row in Deedle dataframe - f#

I have a Deedle Data frame that looks like this.
val it : Frame<int,string> =
Date size1 size2
13 -> 2013-12-12T00:00:00.103336Z 133 35
14 -> 2013-12-12T00:00:00.105184Z 83 35
15 -> 2013-12-12T00:00:00.107205Z 83 35
16 -> 2013-12-12T00:00:00.109566Z 83 34
17 -> 2013-12-12T00:00:00.115260Z 83 34
18 -> 2013-12-12T00:00:00.133546Z 83 34
20 -> 2013-12-12T00:00:00.138204Z 82 34
22 -> 2013-12-12T00:00:00.140125Z 81 34
I would like to remove rows that have the same values for both size1 and size2 as the previous row. In pseudo code...
if row?size1 = prevRow?size1 && row?size2 = prevRow?size2 then dropRow
So in the example above I would end up with:
val it : Frame<int,string> =
Date size1 size2
13 -> 2013-12-12T00:00:00.103336Z 133 35
14 -> 2013-12-12T00:00:00.105184Z 83 35
16 -> 2013-12-12T00:00:00.109566Z 83 34
20 -> 2013-12-12T00:00:00.138204Z 82 34
22 -> 2013-12-12T00:00:00.140125Z 81 34
I believe I want to use
Frame.filterRowValues(row - > )
But I don't see how to compare one row against the previous row. Is there a simple way to do this? Perhaps I need to shift and join?

This can be done using a number of ways and I'm not quite sure which is the best one:
Use shift and join (as you say) would certainly work - you'd need to rename the columns in one of the frames so that you can join them, but it sounds like quite a good solution to me
You can use frame.Rows |> Series.pairwise to get tuples containing the current and the previous row, then use Series.filter and Series.map (to select the second row from the tuple) and re-construct frame using Frame.ofRows. The only issue is that you'll always lost the first row this way (and you'll have to add it back).
You can use Frame.filter and find the previous row. The recent release supports Lookup.Smaller which lets you do that easily.
The code for the third option looks like this (note that the frame rows need to be ordered frame.Rows.IsOrdered = true) for this to work:
frame |> Frame.filterRows (fun k row ->
let prev = frame.Rows |> Series.tryLookup k Lookup.Smaller // New in v1.0
match prev with
| Some prev -> prev?Something <> row?Something
| _ -> true (* always return true for the first row *) )

Related

Calculate the longest continuous running time of a device

I have a table created with the following script:
n=15
ts=now()+1..n * 1000 * 100
status=rand(0 1 ,n)
val=rand(100,n)
t=table(ts,status,val)
select * from t order by ts
where
ts is the time, status indicates the device status (0: down; 1: running), and val indicates the running time.
Suppose I have the following data:
ts status val
2023.01.03T18:17:17.386 1 58
2023.01.03T18:18:57.386 0 93
2023.01.03T18:20:37.386 0 24
2023.01.03T18:22:17.386 1 87
2023.01.03T18:23:57.386 0 85
2023.01.03T18:25:37.386 1 9
2023.01.03T18:27:17.386 1 46
2023.01.03T18:28:57.386 1 3
2023.01.03T18:30:37.386 0 65
2023.01.03T18:32:17.386 1 66
2023.01.03T18:33:57.386 0 56
2023.01.03T18:35:37.386 0 42
2023.01.03T18:37:17.386 1 82
2023.01.03T18:38:57.386 1 95
2023.01.03T18:40:37.386 0 19
So how do I calculate the longest continuous running time? For example, both the 7th and 8th records have the status 1, I want to sum their val values. Or the 14th-15th records, I want to sum up their val values.

You can use the built-in function segment to group the consecutive identical values. The full script is as follows:
select first(ts), sum(iif(status==1, val, 0)) as total_val
from t
group by segment(status)
having sum(iif(status==1, val, 0)) > 0
The result:
segment_status first_ts total_val
0 2023.01.03T18:17:17.386 58
3 2023.01.03T18:22:17.386 87
5 2023.01.03T18:25:37.386 58
9 2023.01.03T18:32:17.386 66
12 2023.01.03T18:37:17.386 177

Why does SCAN/LAMBDA give unexpected results?

It is very possible that I dont understand the lambda logic or do I? I have dataset A2:A5 like:
1
3
6
10
If I do: =SCAN(0, A2:A5, LAMBDA(aa, bb, aa+bb)) i get:
1
4
10
20
If I do: =SCAN(0, A2:A5, LAMBDA(aa, bb, ROW(bb)-1)) I get
1
2
3
4
if I run: =SCAN(0, A2:A5, LAMBDA(aa, bb, (aa+bb)*(ROW(bb)-1))) the result is
1
8
42
208
Why there is 42 and 208 ? How this results in such values? How can it be 42 and 208 ?
Expected result is
1
8
30
80
And I can get it with:
=ArrayFormula(SCAN(0, A2:A5, LAMBDA(aa, bb, aa+bb))*(ROW(A2:A5)-1))
But not with
=SCAN(0, A2:A5, LAMBDA(aa, bb, (aa+bb)*(ROW(bb)-1)))

SCAN is a great intermediate results function. To understand how SCAN operates, you need to understand how REDUCE operates. The syntax is:
=REDUCE(initial_value, array, LAMBDA(accumulator, current_value, some_function()))
Going through =SCAN(0, A2:A5, LAMBDA(aa, bb, (aa+bb)*(ROW(bb)-1))) step by step,
A2:A5 is 1,3,6,10
Step 1:
aa = 0(initial_value)
bb = 1(current_value:A2)
Result((aa+bb)*(ROW(bb)-1)): (0+1)*(2-1)=1
Step 2:
aa = 1(accumulator(previous return value))
bb = 3(current_value:A3)
Result((aa+bb)*(ROW(bb)-1)): (1+3)*(3-1)=8
Step 3:
aa = 8(accumulator(previous return value))
bb = 6(current_value:A4)
Result((aa+bb)*(ROW(bb)-1)): (8+6)*(4-1)=42
Step 4:
aa = 42(accumulator(previous return value))
bb = 10(current_value:A5)
Result((aa+bb)*(ROW(bb)-1)): (42+10)*(5-1)=52*4=208

aa stores the result of the previous calculation, so you have:

above answers pretty much contain all so I will add only this:
you probably expected that by doing (aa+bb)*(ROW(bb)-1) you will get:
(aa+bb)
*
(ROW(bb)-1)
1
*
1
=
1
4
*
2
=
8
10
*
3
=
30
20
*
4
=
80
but that's not how it works. to get your expected result and by not using your formula where ROW is outside of SCAN:
=ArrayFormula(SCAN(0, A2:A5, LAMBDA(aa, bb, aa+bb))*(ROW(A2:A5)-1))
you would need to do:
=INDEX(MAP(SCAN(0, A2:A5, LAMBDA(aa, bb, (aa+bb))), ROW(A2:A5)-1, LAMBDA(cc, dd, cc*dd)))
where cc is the entire SCAN and dd is ROW(A2:A5)-1 eg. first do the running total and then multiplication, which is not so feasible length-wise.
or shorter but with SEQUENCE:
=MAP(SCAN(0, A2:A5, LAMBDA(aa, bb, (aa+bb))), SEQUENCE(4), LAMBDA(cc, dd, cc*dd))

Function closures with mapslices

In the code snipped below, functions f and g are returning different values. From reading the code, you would expect them to behave the same. I am guessing it is to do with closure of v -> innerprodfn(m, v). How do I do it to get the desired behaviour where f and g return the same values.
type Mat{T<:Number}
data::Matrix{T}
end
innerprodfn{T}(m::Mat{T}, v::Array{T}) = i -> (m.data*v)[i]
innerprodfn{T}(m::Mat{T}, vv::Matrix{T}) = mapslices(v->innerprodfn(m, v), vv, 1)
m = Mat(collect(reshape(0:5, 2, 3)))
v = collect(reshape(0:11, 3, 4))
f = innerprodfn(m, v[:,1])
g = innerprodfn(m, v)[1]
m.data * v
# 10 28 46 64
# 13 40 67 94
[f(1) g(1); f(2) g(2)]
# 10 64
# 13 94

I don't have an explanation for the observed behavior, but on a recent nightly version of Julia one gets the expected result.
On 0.5, a workaround is to use a comprehension:
innerprodfn{T}(m::Mat{T}, vv::Matrix{T}) = [innerprodfn(m, vv[:,i]) for i in indices(vv, 2)]
Of course, this works on 0.6 as well.

Inverting table

This is the table I have:
A B
1 Title1 | Title
2
3 0 | # of teachers
4
5 11 | # of students
6
7 Not active | Active?
8
9
10
11 Title2 | Title
12
13 3 | # of teachers
14
15 5 | # of students
16
17 Not active | Active?
18
19
20
21 Title3 | Title
22
23 10 | # of teachers
24
25 22 | # of students
26
27 Not active | Active?
I'd like to "invert" it in another sheet to have Title, # of teachers, # of students and Active? as headers and then the values under the right column (each entry in a separate row).
I was trying to use MATCH without much luck..
This retrieves just the first Title (every time):
=index(SheetWithTable!$A:$A,match("Title",SheetWithTable!$B:$B,0))

Please copy your sheet and in that copy select B1:B7, Copy, Paste special into D1 with Paste transpose. In D3, copied across to J3 and down to suit:
=index($A1:$A11,match(D$1,$B1:$B11,0))
Select all, Copy, Paste special, Paste values only. Filter Column B to deselect # of teachers only and delete all rows but Row 1. Clear filter. Delete Columns I, G, E, C, B, A.

Try the following formula in cell D1:
={{B1,B3,B5,B7};{filter(A:A,B:B=B1),filter(A:A,B:B=B3),filter(A:A,B:B=B5),filter(A:A,B:B=B7)}}
Or you can try:
={{"Title","# of teachers","# of students","Active?"};{filter(A:A,B:B="Title"),filter(A:A,B:B="# of teachers"),filter(A:A,B:B="# of students"),filter(A:A,B:B="Active?")}}
Have a look at the following screenshot:

If you have the 3 titles in a table, you can rotate it:
rotateArray = function(array) {
var newArray = [];
for (var i = 0; i < array[0].length; i++) {
newArray[i] = [];
for (var j = 0; j < array.length; j++) {
newArray[i].push(array[j][i]);
}
}
return newArray;
};
http://dtab.io/sheets/560b8efd6faeb39d2a70ad1e

stored procedure for fetching data from multiple tables

I have Stored procedure like this
Select k.HBarcode, m.make,t.plateno ,v.vtype ,l.locname,mdl.model,c.Colname
from transaction_tbl t,
KHanger_tbl k,
make_tbl m,
vtype_tbl v,
Location_tbl l,
Model_tbl mdl,
Color_tbl C
where t.tbarcode=#carid and
t.mkid=m.mkid and
v.vtid=t.vtid and
t.locid=l.locid and
mdl.mdlid=t.mdlid and
t.colid=c.colid and
t.transactID=k.transactID
while executing this am getting output
HBarcode make plateno vtype locname model Colname
34 BMW 44554 Normal Fashion Avenue 520 Red
I have two more tables, from transaction table I can get transactid (above ex:t.transactID),then I can get the corresponding
tid from "KHanger_tbl",then i want show uniquename for corresponding tid from "Terminal" table
1-KHanger_tbl
transactid HBarcode tid
--------------------------------------- ----------------------------------
19 34 7
22 002 5
21 1 7
23 200005 6
2- Terminals_tbl
tid UniqueName
----------- --------------------------------------------------
5 Key Room-1
6 Podium -1
7 Key Room - 2
Expected output
UniqueName HBarcode make plateno vtype locname model Colname
-------------------------------------------------------------------------------
KeyRoom-2 34 BMW 44554 Norma Fashion Avenue 520 Red
so how I can write stored procedure for this, if any one knows,please help me

Something like this maybe ?
Select term.UniqueName ,
k.HBarcode, m.make,t.plateno ,v.vtype ,l.locname,mdl.model,c.Colname
from transaction_tbl t,
KHanger_tbl k,
make_tbl m,
vtype_tbl v,
Location_tbl l,
Model_tbl mdl,
Color_tbl C
,
Terminals_tbl Term
where t.tbarcode=#carid and
t.mkid=m.mkid and
v.vtid=t.vtid and
t.locid=l.locid and
mdl.mdlid=t.mdlid and
t.colid=c.colid and
t.transactID=k.transactID
and
term.tid = t.transactID

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Remove rows that are near duplicate of previous row in Deedle dataframe - f#

Related

Calculate the longest continuous running time of a device

Why does SCAN/LAMBDA give unexpected results?

Function closures with mapslices

Inverting table

stored procedure for fetching data from multiple tables

Categories

Resources