Apply List.flatten to a Stream - stream

The following pipe:
items
...
|> Stream.map(&process/1)
Generates this kind of structure:
[ [], [], [], [] ]
and I would like it to be a flatten list.
Without using streams I would just do:
|> Enum.map(&process/1)
|> List.flatten
But I would like to use it as a stream but cant figure out how to apply List.flatten and generate a Stream.

You could try using Stream.flat_map/2, something this should help:
items
...
|> Stream.flat_map(&process/1)
This will keep processing elements in your items, and flatten the results.
Hope that helps!

Related

What is the syntax to use List.AddRange with an F# list?

Let's say I have:
let mutableList = List<string>()
let someList = [ "a"; "b"; "c" ]
how can I do something like this:
mutableList.AddRange(someList)
I have a case when I need to add elements to a list based on some dynamic criteria and the elements arrive as lists.
I understand I can iterate through my list and add them one by one, but I'm curious if there is a quick / clean syntax to make the immutable lists work with AddRange?
I'm not sure if I understood your question correctly. The code using AddRange in your snippet works fine (if you open the System.Collections.Generic namespace).
I would probably use ResizeList<string> rather than List<string>, but this is just using an F# alias for the type name, which is easier to understand and does not require an extra namespace.
If you are asking how to construct an immutable list in a way that is a bit like using AddRange, then my recommendation would be to look at list comprehensions.
Let's say we have the following (completely meaningless) code using a mutable list:
let l = ResizeArray<string>()
for x in 0 .. 10 do
if x % 7 = 0 then
l.AddRange(["hey"; string x])
To do the same thing using an immutable list, you can write:
let l =
[ for x in 0 .. 10 do
if x % 7 = 0 then
yield! ["hey"; string x] ]
You can't add a range directly on an immutable list. By definition that list can't be changed. But you can create a new immutable list out of two other lists.
You can use the List.append method.
let goodol'list = []
let extendedList = List.append goodol'list [ "a"; "b"; "c" ]
Another way as Abel pointed out in the comments, is to use the # shorthand.
let goodol'list = []
let extendedList = goodol'list # [ "a"; "b"; "c" ]
As to how to dynamically add items to an immutable list you can use Tomas Petricek provided answer.

F# todo list using immutable objects

I'm trying to figure out how to do a to do list in F# using immutable objects. The to do list (not necessarily an F# list) might be pulled from a database or collected from user input or read from XML or JSON, etc. That part is not so important.
Pseudo code:
do for some length of time:
for each item in the to do list:
if item is ready to do:
do item
if it worked:
remove from the todo list
wait a bit before trying again
report on items that weren't ready or that failed.
The to do list will be some collection of F# records which will have at least an instruction ("Send Email", "Start a process", "Copy a File", "Ask for a raise") along with parameters as a sub-collection.
Can such a thing be done with immutable objects alone? Or must I use a .NET List or some other mutable object?
I don't need fully-fleshed out working code, just some ideas about how I'd put such a thing together.
UPDATE: First attempt at (half-)coding this thing:
let processtodo list waittime deadline =
let rec inner list newlist =
match list with
| [] when not List.isEmpty newlist ->
inner newlist []
| head :: tail when head.isReady->
let res = head.action
inner tail ( if res = true then tail else list)
| head :: tail when not head.isReady ->
inner tail list
| _ when deadline not passed ->
// wait for the waittime
inner list
| _ -> report on unfinished list
inner list []
I tried to write this in the typical fashion seen in many examples. I assumed that the items support the "isReady" and "action" methods. The thing I don't like is its not tail-call recursive, so will consume stack space for each recursion.
Recursion and/or continuations are the typical strategies to transform code with mutable structures in loops to immutable structures. If you know how to write a recursive "List.filter" then you'll probably have some ideas to be on the right track.

What is the Erlang way to do stream manipulations?

Suppose I wanted to do something like:
dict
.values()
.map(fun scrub/1)
.flatMap(fun split/1)
.groupBy(fun keyFun/1, fun count/1)
.to_dict()
What is the most elegant way to achieve this in Erlang?
There is no direct easy way of doing that. All attempts I saw looked even worse than straightforward composition. If you will look at majority of open source project in Erlang, you will find that they use generic composition. Re-using your example:
to_dict(
groupBy(fun keyFun/1, fun count/1,
flatMap(fun split/1,
map(fun scrub/1,
values(dict))))).
This isn't a construct that's natural in Erlang. If you have a couple functions, regular composition is what I'd use:
lists:flatten(lists:map(fun (A) ->
do_stuff(A)
end,
generate_list())).
For a longer series of operations, intermediary variables:
Dict = #{hello => world, ...},
Values = maps:values(Dict),
ScrubbedValues = lists:map(fun scrub/1, Values),
SplitValues = lists:flatten(lists:map(fun split/1, ScrubbedValues)),
GroupedValues = basil_lists:group_by(fun keyFun/1, fun count/1, SplitValues),
Dict2 = maps:from_list(GroupedValues).
That's how it'd look if you wanted all of those operations grouped in one shot together.
However, I'd more likely write this in a different way:
-spec remap_values(map()) -> map().
remap_values(Map) ->
map_values(maps:values(Map)).
-spec map_values(list()) -> map().
map_values(Values) ->
map_values(Values, [], []).
-spec map_values(list(), list(), list()) -> map().
map_values([], OutList, OutGroup) ->
%% Base case: transform into a map
Grouped = lists:zip(OutGroup, OutList),
lists:foldl(fun ({Group, Element}, Acc = #{Group := Existing}) ->
Acc#{Group => [Element | Existing]};
({Group, Element}, Acc) ->
Acc#{Group => [Element]}
end,
#{},
Grouped;
map_values([First|Rest], OutList, OutGroup) ->
%% Recursive case: categorize process the first element and categorize the result
Processed = split(scrub(First)),
Categories = lists:map(fun categorize/1, Processed),
map_values(Rest, OutList ++ Processed, OutGroup ++ Categories).
The actual correct implementation depends a lot on how the code's going to be run -- what I've written here is pretty simple, but might not perform well on large amounts of data. If you're actually looking to process an endless stream of data you'll need to write that yourself (though you may find Gen Servers to be a very useful framework for doing so).

Affix a string to items within a lists:flatten in Erlang?

I have a list like this one ['a','b','c','d'] and what I need is to add a affix to each item in that list like : ['a#erlang','b#erlang','c#erlang','d#erlang']
I tried using 1lists:foreach1 and then concat two strings to one and then lists:append to the main list, but that didn't work for me.
Example of what I tried:
LISTa = [],
lists:foreach(fun (Item) ->
LISTa = lists:append([Item,<<"#erlang">>])
end,['a','b','c','d'])
Thanks in advance.
1> L = ['a','b','c','d'].
[a,b,c,d]
2> [ list_to_atom(atom_to_list(X) ++ "#erlang") ||X <- L].
[a#erlang,b#erlang,c#erlang,d#erlang]
Please try this code, you can use list_to_atom and atom_to_list.
This will do the trick (using list comprehensions):
1> L = ["a","b","c","d"].
["a","b","c","d"]
2> R = [X ++ "#erlang" || X <- L].
["a#erlang","b#erlang","c#erlang","d#erlang"]
3>
Notice that I changed the atoms for strings; It's discouraged to "create atoms on the fly/dynamically" in Erlang, so I have that framed in my mind. If you still need so, change the implementation a little bit and you are good to go.
NOTE: I'm assuming the concatenation between atoms and binaries is, somehow, something you did not do on purpose.

How to do a left join on a non unique column/index in Deedle

I am trying to do a left join between two data frames in Deedle. Examples of the two data frames are below:
let workOrders =
Frame.ofColumns [
"workOrderCode" =?> series [ (20050,20050); (20051,20051); (20060,20060) ]
"workOrderDescription" =?> series [ (20050,"Door Repair"); (20051,"Lift Replacement"); (20060,"Window Cleaning") ]]
// This does not compile due to the duplicate Work Order Codes
let workOrderScores =
Frame.ofColumns [
"workOrderCode" => series [ (20050,20050); (20050,20050); (20051,20051) ]
"runTime" => series [ (20050,20100112); (20050,20100130); (20051,20100215) ]
"score" => series [ (20050,100); (20050,120); (20051,80) ]]
Frame.join JoinKind.Outer workOrders workOrderScores
The problem is that Deedle will not let me create a data frame with a non unique index and I get the following error: System.ArgumentException: Duplicate key '20050'. Duplicate keys are not allowed in the index.
Interestingly in Python/Pandas I can do the following which works perfectly. How can I reproduce this result in Deedle? I am thinking that I might have to flatten the second data frame to remove the duplicates then join and then unpivot/unstack it?
workOrders = pd.DataFrame(
{'workOrderCode': [20050, 20051, 20060],
'workOrderDescription': ['Door Repair', 'Lift Replacement', 'Window Cleaning']})
workOrderScores = pd.DataFrame(
{'workOrderCode': [20050, 20050, 20051],
'runTime': [20100112, 20100130, 20100215],
'score' : [100, 120, 80]})
pd.merge(workOrders, workOrderScores, on = 'workOrderCode', how = 'left')
# Result:
# workOrderCode workOrderDescription runTime score
#0 20050 Door Repair 20100112 100
#1 20050 Door Repair 20100130 120
#2 20051 Lift Replacement 20100215 80
#3 20060 Window Cleaning NaN NaN
This is a great question - I have to admit, there is currently no elegant way to do this with Deedle. Could you please submit an issue to GitHub to make sure we keep track of this and add some solution?
As you say, Deedle does not let you have duplicate values in the keys currently - although, your Pandas solution also does not use duplicate keys - you simply use the fact that Pandas lets you specify the column to use when joining (and I think this would be great addition to Deedle).
Here is one way to do what you wanted - but not very nice. I think using pivoting would be another option (there is a nice pivot table function in the latest source code - not yet on NuGet).
I used groupByRows and nest to turn your data frames into series grouped by the workOrderCode (each item now contains a frame with all rows that have the same work order code):
let workOrders =
Frame.ofColumns [
"workOrderCode" =?> Series.ofValues [ 20050; 20051; 20060 ]
"workOrderDescription" =?> Series.ofValues [ "Door Repair"; "Lift Replacement"; "Window Cleaning" ]]
|> Frame.groupRowsByInt "workOrderCode"
|> Frame.nest
let workOrderScores =
Frame.ofColumns [
"workOrderCode" => Series.ofValues [ 20050; 20050; 20051 ]
"runTime" => Series.ofValues [ 20100112; 20100130; 20100215 ]
"score" => Series.ofValues [ 100; 120; 80 ]]
|> Frame.groupRowsByInt "workOrderCode"
|> Frame.nest
Now we can join the two series (because their work order codes are the keys). However, then you get one or two data frames for each joined order code and there is quite a lot of work needed to outer join the rows of the two frames:
// Join the two series to align frames with the same work order code
Series.zip workOrders workOrderScores
|> Series.map(fun _ (orders, scores) ->
match orders, scores with
| OptionalValue.Present s1, OptionalValue.Present s2 ->
// There is a frame with some rows with the specified code in both
// work orders and work order scores - we return a cross product of their rows
[ for r1 in s1.Rows.Values do
for r2 in s2.Rows.Values do
// Drop workOrderCode from one series (they are the same in both)
// and append the two rows & return that as the result
yield Series.append r1 (Series.filter (fun k _ -> k <> "workOrderCode") r2) ]
|> Frame.ofRowsOrdinal
// If left or right value is missing, we just return the columns
// that are available (others will be filled with NaN)
| OptionalValue.Present s, _
| _, OptionalValue.Present s -> s)
|> Frame.unnest
|> Frame.indexRowsOrdinally
This might be slow (especially in the NuGet version). If you work on more data, please try building latest version of Deedle from sources (and if that does not help, please submit an issue - we should look into this!)

Resources