Filtering Rows in F#

Filtering Rows in F# - f#

I have a code where I have the following frame and filter rows as follows:
let dfff=
[ "year" => series [ 1 => 1990.0; 2 => 1991.00; 3 => 1992.0; 4 => 1993.0]
"gold" => series [ 1 => 10.0; 2 => 10.00; 3 => 15.0; 4 => 20.0]
"silver" => series [ 1 => 20.0; 2 => 30.00; 3 => 45.0; 4 => 55.0] ]
|> frame
let dfff2 = dfff |> Frame.filterRows (fun key row -> row?year <= 1992.0 )
Why do I have to write key in
Frame.filterRows (fun key row -> row?year <= 1992.0)
if my function only depends on row? What role does key play here? I will appreciate if anybody could explain me the logic. Thanks!

In Deedle, frames have row keys and column keys. In your case, you have Frame<int, string> meaning that the row keys are integers (just numbers) and column keys are strings (column names) - but you might also have dates or other more interesting things as row keys.
The filterRows function gives you the row key together with row data. The key parameter is just the row key - in your case, this is (uninteresting) int index, but it might be e.g. useful date in other scenarios.
F# lets you write _ to explicitly ignore the value:
let dfff2 = dfff |> Frame.filterRows (fun _ row -> row?year <= 1992.0 )
In the Series module, we have Series.filter and Series.filterValues where the first one gives you key & value and the second one gives you just the value. So, we could follow the same pattern and add Frame.filterRowValues.
This would actually be quite easy, so if you want to contribute, please send a pull request with a change somewhere around here :-).

Related

How do I sort a simple Lua table alphabetically?

I have already seen many threads with examples of how to do this, the problem is, I still can't do it.
All the examples have tables with extra data. For example somethings like this
lines = {
luaH_set = 10,
luaH_get = 24,
luaH_present = 48,
}
or this,
obj = {
{ N = 'Green1' },
{ N = 'Green' },
{ N = 'Sky blue99' }
}
I can code in a few languages but I'm very new to Lua, and tables are really confusing to me. I can't seem to work out how to adapt the code in the examples to be able to sort a simple table.
This is my table:
local players = {"barry", "susan", "john", "wendy", "kevin"}
I want to sort these names alphabetically. I understand that Lua tables don't preserve order, and that's what's confusing me. All I essentially care about doing is just printing these names in alphabetical order, but I feel I need to learn this properly and know how to index them in the right order to a new table.
The examples I see are like this:
local function cmp(a, b)
a = tostring(a.N)
b = tostring(b.N)
local patt = '^(.-)%s*(%d+)$'
local _,_, col1, num1 = a:find(patt)
local _,_, col2, num2 = b:find(patt)
if (col1 and col2) and col1 == col2 then
return tonumber(num1) < tonumber(num2)
end
return a < b
end
table.sort(obj, cmp)
for i,v in ipairs(obj) do
print(i, v.N)
end
or this:
function pairsByKeys (t, f)
local a = {}
for n in pairs(t) do table.insert(a, n) end
table.sort(a, f)
local i = 0 -- iterator variable
local iter = function () -- iterator function
i = i + 1
if a[i] == nil then return nil
else return a[i], t[a[i]]
end
end
return iter
end
for name, line in pairsByKeys(lines) do
print(name, line)
end
and I'm just absolutely thrown by this as to how to do the same thing for a simple 1D table.
Can anyone please help me to understand this? I know if I can understand the most basic example, I'll be able to teach myself these harder examples.

local players = {"barry", "susan", "john", "wendy", "kevin"}
-- sort ascending, which is the default
table.sort(players)
print(table.concat(players, ", "))
-- sort descending
table.sort(players, function(a,b) return a > b end)
print(table.concat(players, ", "))
Here's why:
Your table players is a sequence.
local players = {"barry", "susan", "john", "wendy", "kevin"}
Is equivalent to
local players = {
[1] = "barry",
[2] = "susan",
[3] = "john",
[4] = "wendy",
[5] = "kevin",
}
If you do not provide keys in the table constructor, Lua will use integer keys automatically.
A table like that can be sorted by its values. Lua will simply rearrange the index value pairs in respect to the return value of the compare function. By default this is
function (a,b) return a < b end
If you want any other order you need to provide a function that returs true if element a comes befor b
Read this https://www.lua.org/manual/5.4/manual.html#pdf-table.sort
table.sort
Sorts the list elements in a given order, in-place, from list[1] to
list[#list]
This example is not a "list" or sequence:
lines = {
luaH_set = 10,
luaH_get = 24,
luaH_present = 48,
}
Which is equivalent to
lines = {
["luaH_set"] = 10,
["luaH_get"] = 24,
["luaH_present"] = 48,
}
it only has strings as keys. It has no order. You need a helper sequence to map some order to that table's element.
The second example
obj = {
{ N = 'Green1' },
{ N = 'Green' },
{ N = 'Sky blue99' }
}
which is equivalent to
obj = {
[1] = { N = 'Green1' },
[2] = { N = 'Green' },
[3] = { N = 'Sky blue99' },
}
Is a list. So you could sort it. But sorting it by table values wouldn't make too much sense. So you need to provide a function that gives you a reasonable way to order it.
Read this so you understand what a "sequence" or "list" is in this regard. Those names are used for other things as well. Don't let it confuse you.
https://www.lua.org/manual/5.4/manual.html#3.4.7
It is basically a table that has consecutive integer keys starting at 1.
Understanding this difference is one of the most important concepts while learning Lua. The length operator, ipairs and many functions of the table library only work with sequences.

This is my table:
local players = {"barry", "susan", "john", "wendy", "kevin"}
I want to sort these names alphabetically.
All you need is table.sort(players)
I understand that LUA tables don't preserve order.
Order of fields in a Lua table (a dictionary with arbitrary keys) is not preserved.
But your Lua table is an array, it is self-ordered by its integer keys 1, 2, 3,....

To clear up the confusing in regards to "not preserving order": What's not preserving order are the keys of the values in the table, in particular for string keys, i.e. when you use the table as dictionary and not as array. If you write myTable = {orange="hello", apple="world"} then the fact that you defined key orange to the left of key apple isn't stored. If you enumerate keys/values using for k, v in pairs(myTable) do print(k, v) end then you'd actually get apple world before orange hello because "apple" < "orange".
You don't have this problem with numeric keys though (which is what the keys by default will be if you don't specify them - myTable = {"hello", "world", foo="bar"} is the same as myTable = {[1]="hello", [2]="world", foo="bar"}, i.e. it will assign myTable[1] = "hello", myTable[2] = "world" and myTable.foo = "bar" (same as myTable["foo"]). (Here, even if you would get the numeric keys in a random order - which you don't, it wouldn't matter since you could still loop through them by incrementing.)
You can use table.sort which, if no order function is given, will sort the values using < so in case of numbers the result is ascending numbers and in case of strings it will sort by ASCII code:
local players = {"barry", "susan", "john", "wendy", "kevin"}
table.sort(players)
-- players is now {"barry", "john", "kevin", "susan", "wendy"}
This will however fall apart if you have mixed lowercase and uppercase entries because uppercase will go before lowercase due to having lower ASCII codes, and of course it also won't work properly with non-ASCII characters like umlauts (they will go last) - it's not a lexicographic sort.
You can however supply your own ordering function which receives arguments (a, b) and needs to return true if a should come before b. Here an example that fixes the lower-/uppercase issues for example, by converting to uppercase before comparing:
table.sort(players, function (a, b)
return string.upper(a) < string.upper(b)
end)

Google sheet array formula with Filter & Join

I need to build a table based on the following data:
Ref
Product
R1
ProdA
R2
ProdC
R1
ProdB
R3
ProdA
R4
ProdC
And here the result I need:
My Product
All Ref
ProdA
R1#R3
ProdC
R2#R4
The particularity is that the 'My Product' column is computed elsewhere. So I need an arrayformula based on 'My Product' column to look in the first table to build the 'All Ref' column. You follow me?
I know that Arrayformula is not compatible with filter and join ... I expect a solution like this one Google sheet array formula + Join + Filter but not sure to understand all steps and if really adapted to my case study.
Hope you can help.

You could try something like this:
CREDIT: player0 for the method shared to similar questions
=ARRAYFORMULA(substitute(REGEXREPLACE(TRIM(SPLIT(TRANSPOSE(
QUERY(QUERY({B2:B&"😊", A2:A&"#"},
"select max(Col2)
where Col1 !=''
group by Col2
pivot Col1"),,999^99)), "😊")), "#$", )," ",""))
Step by step:

Instead of the workaround hacks I implemented a simple joinMatching(matches, values, texts, [sep]) function in Google Apps Script.
In your case it would be just =joinMatching(MyProductColumn, ProductColumn, RefColumn, "#").
Source:
// Google Apps Script to join texts in a range where values in second range equal to the provided match value
// Solves the need for `arrayformula(join(',', filter()))`, which does not work in Google Sheets
// Instead you can pass a range of match values and get a range of joined texts back
const identity = data => data
const onRange = (data, fn, args, combine = identity) =>
Array.isArray(data)
? combine(data.map(value => onRange(value, fn, args)))
: fn(data, ...(args || []))
const _joinMatching = (match, values, texts, sep = '\n') => {
const columns = texts[0]?.length
if (!columns) return ''
const row = i => Math.floor(i / columns)
const col = i => i % columns
const value = i => values[row(i)][col(i)]
return (
// JSON.stringify(match) +
texts
.flat()
// .map((t, i) => `[${row(i)}:${col(i)}] ${t} (${JSON.stringify(value(i))})`)
.filter((_, i) => value(i) === match)
.join(sep)
)
}
const joinMatching = (matches, values, texts, sep) =>
onRange(matches, _joinMatching, [values, texts, sep])```

How to summarize values by key of an Erlang map list?

This is what my list of maps looks like:
Map = [#{votes=>3, likes=>20, views=> 100},#{votes=>0, likes=>1, views=> 70},#{votes=>1, likes=>14, views=> 2000}].
I would like to return a summary of all map entries. I have attempted to solve this with fun()s but the logic does not make sense, and I only got non-executeable code.
The problem is that one cannot change variables in Erlang, otherwise this would work:
Summary = #{
votes=>0,
likes=>0,
views=>0,
},
[maps:update(Key, maps:get(Key, MapItem) + maps:get(Key, Summary), Summary) || MapItem <- Map, Key <- [votes, likes, views]].
How ought one go about this and successfully summarize the values of a list of maps?

The functions of fold family are designed to be used in such situations. In your case the following code calculates the map containing totals of entries in maps in the list:
MapsList = [#{votes=>3, likes=>20, views=> 100},
#{votes=>0, likes=>1, views=> 70},
#{votes=>1, likes=>14, views=> 2000}],
Summary = lists:foldl(fun (Map, AccL) ->
maps:fold(fun (Key, Value, Acc) ->
Acc#{Key => Value + maps:get(Key, Acc, 0)}
end, AccL, Map)
end, #{}, MapsList)
Summary value is the map #{votes => 4, likes => 35, views => 2170}.

Lambda : GroupBy multiple columns where one of them is DbFunctions.TruncateTime()

NOTE : This is NOT a duplicate of this useful SO question, my problem is all about the TruncateTime inside the GroupBy clause. See explanations below :
I'd like to use DbFunctions.TruncateTime in a multiple GroupBy clause, but it doesn't seem to work in my ASP.NET MVC5 project.
Here is a first lambda I wrote, it gives me the total number of views per day for a set of data.
It gives me the expected result :
var qViews = dbContext.TABLE_C.Where(c => c.IdUser == 1234)
.Join(dbContext.TABLE_V.Where(v => v.Date > DbFunctions.AddMonths(DateTime.Now, -1)), c => c.Id, v => v.Id, (c, v) => new { c, v })
.GroupBy(x => DbFunctions.TruncateTime(x.v.MyDateTimeColumn))
.Select(g => new
{
Date = (DateTime)g.Key,
NbViews = g.Count(),
}).ToDictionary(p => p.Date, p => p.NbViews);
Result is something like that :
...
Date | Views
03/07/2018 | 15
03/08/2018 | 8
03/09/2018 | 23
Now, I'd like a more detailled result, with the number of views per day AND PER ITEM on the same set of data.
Here is what I'd like to write :
var qViews = dbContext.TABLE_C.Where(c => c.IdUser == 1234)
.Join(dbContext.TABLE_V.Where(v => v.Date > DbFunctions.AddMonths(DateTime.Now, -1)), c => c.Id, v => v.Id, (c, v) => new { c, v })
.GroupBy(x => new { DbFunctions.TruncateTime(x.v.MyDateTimeColumn), x.c.Id}) // Issue #1
.Select(g => new
{
Date = g.Key.Date, //Issue #2
NbViews = g.Count(),
}).ToDictionary(p => p.Date, p => p.NbViews);
And I expected something like that :
...
Date | Views | ID Item
03/07/2018 | 4 | 456789
03/07/2018 | 11 | 845674
03/08/2018 | 6 | 325987
03/08/2018 | 1 | 548965
03/08/2018 | 1 | 222695
03/09/2018 | 23 | 157896
So, this request have two issues (see comments above)
Issue #1 : It seems I can't GroupBy multiple columns, which one of them use DbFunctions. If I use .GroupBy(x => new { x.v.MyDateTimeColumn, x.c.Id }), code compiles, but doesn't give me the expected result, as I want to group by date, not date + time
Issue #2 : Date = g.Key.Date, seems wrong for the compiler. When I wrote g.Key, autocompletion only suggests me the Id column, but it doesn't see the truncated date.
Why can't I GroupBy multiple columns, with one of them is a truncated Date ?
Is there any workaround ?

You need to give your anonymous type's properties names if you want to use them later on:
.GroupBy(x => new
{ Date = DbFunctions.TruncateTime(x.v.MyDateTimeColumn),
Id = x.c.Id
})
Then you can project on that:
.Select(g => new
{
Date = g.Date,
NbViews = g.Count(),
})
And finally you cannot do this:
.ToDictionary(p => p.Date, p => p.NbViews);
because you will get this error:
An item with the same key has already been added.
Why? Because the Date is not unique since you just grouped by Date and Id so Date(s) will be duplicated. It is the same as this but this is a list of string:
var nums = new List<string> { "1", "1", "1", "2" };
nums.ToDictionary(x => x, x => x);
But, perhaps, you may want to do this:
var lu = nums.ToLookup(x => x, x => x);
And now you can look them up:
// Returns 3 items since there are 3 "1"s
IEnumerable<string> ones = lu["1"];

Spark join hangs

I have a table with n columns that I'll call A. In this table there are three columns that i'll need:
vat -> String
tax -> String
card -> String
vat or tax can be null, but not at the same time.
For every unique couple of vat and tax there is at least one card.
I need to alter this table, adding a column count_card in which I put a text based on the number of cards every unique combination of tax and vat has.
So I've done this:
val cardCount = A.groupBy("tax", "vat").count
val sqlCard = udf((count: Int) => {
if (count > 1)
"MULTI"
else
"MONO"
})
val B = cardCount.withColumn(
"card_count",
sqlCard(cardCount.col("count"))
).drop("count")
In the table B I have three columns now:
vat -> String
tax -> String
card_count -> Int
and every operation on this DataFrame is smooth.
Now, because I wanted to import the new column in A table, i performed the following join:
val result = A.join(B,
B.col("tax")<=>A.col("tax") and
B.col("vat")<=>A.col("vat")
).drop(B.col("tax"))
.drop(B.col("vat"))
Expecting to have the original table A with the column card_count.
Problem is that the join hangs, getting all system resources blocking the pc.
Additional details:
Table A has ~1.5M elements and is read from parquet file;
Table B has ~1.3M elements.
System is a 8 thread and 30GB of RAM
Let me know what I'm doing wrong

At the end, I didn't found out which was the issue, so I changed approach
val cardCount = A.groupBy("tax", "vat").count
val cardCountSet = cardCount.filter(cardCount.col("count") > 1)
.rdd.map(r => r(0) + " " + r(1)).collect().toSet
val udfCardCount = udf((tax: String, vat:String) => {
if (cardCountSet.contains(tax + " " + vat))
"MULTI"
else
"MONO"
})
val result = A.withColumn("card_count",
udfCardCount(A.col("tax"), A.col("vat")))
If someone knows a better approach let me know it

Categories

HOME

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Filtering Rows in F# - f#

Related

How do I sort a simple Lua table alphabetically?

Google sheet array formula with Filter & Join

How to summarize values by key of an Erlang map list?

Lambda : GroupBy multiple columns where one of them is DbFunctions.TruncateTime()

Spark join hangs

Categories

Resources