I'm trying to implement the a split method in Erlang that is supposed to split a string like "i am on the mountain top" into a list like ["i","am","on","the","mountain","top"].
Here is my code (exercise.erl):
-module(exercise).
-import(oi,[read/1]).
-export([split/4]).
split(Text,_,Result,_) when Text == [] -> Result;
split([Head|Tail],Separator,Result,WordSummer) when Head == Separator ->
split(Tail,Separator,[Result|lists:flatten(WordSummer)],[]);
split([Head|Tail],Separator,Result,WordSummer) ->
split(Tail,Separator,Result,[WordSummer|Head]).
The problem I'm having is that when calling my exported function I get the following error:
9> c(exercise).
{ok,exercise}
10> exercise:split("sdffdgfdg dgdfgfg dgdfg dgdfgd dfgdfgdfgtrty hghfgh",$ ,[],[]).
** exception error: no function clause matching lists:do_flatten(103,[]) (lists.erl, line 627)
in function lists:do_flatten/2 (lists.erl, line 628)
in call from exercise:split/4 (exercise.erl, line 9)
11>
How can I solve this?
Two things:
The [WordSummer|Head] in the last line is creating an improper list because Head is an integer (one character of the input string). This is causing the error you're seeing. You probably meant [WordSummer, Head].
[Result|lists:flatten(WordSummer)] is creating a nested list instead of a list of strings. To append one item to a list, use ++ and wrap the right side in a list: Result ++ [lists:flatten(WordSummer)]
Final code:
split(Text,_,Result,_) when Text == [] -> Result;
split([Head|Tail],Separator,Result,WordSummer) when Head == Separator ->
split(Tail,Separator,Result ++ [lists:flatten(WordSummer)],[]);
split([Head|Tail],Separator,Result,WordSummer) ->
split(Tail,Separator,Result,[WordSummer, Head]).
Test:
1> c(exercise).
{ok,exercise}
2> exercise:split("sdffdgfdg dgdfgfg dgdfg dgdfgd dfgdfgdfgtrty hghfgh",$ ,[],[]).
["sdffdgfdg","dgdfgfg","dgdfg","dgdfgd","dfgdfgdfgtrty"]
There's still a bug where the last segment is being ignored. I'll let you figure that out (hint: you need to consider WordSummer in the first clause of the function).
I'm a C# developer and this is my first attempt at writing F#.
I'm trying to read a Dashlane exported database in the CSV format. These files have no headers and a dynamic number of columns for each possible type of entry. The following file is an example of dummy data that I use to test my software. It only contains password entries and yet they have between 5 and 7 columns (I'll decide how to handle other types of data later)
The first line of the exported file (in this case, but not always) is the email address that was used to create the dashlane account which makes this line only one column wide.
"accountCreation#email.fr"
"Nom0","siteweb0","Identifiant0","",""
"Nom1","siteweb1","identifiant1","email1#email.email","",""
"Nom2","siteweb2","email2#email.email","",""
"Nom3","siteweb3","Identifiant3","password3",""
"Nom4","siteweb4","Identifiant4","email4#email.email","password4",""
"Nom5","siteweb5","Identifiant5","email5#email.email","SecondIdentifiant5","password5",""
"Nom6","siteweb6","Identifiant6","email6#email.email","SecondIdentifiant6","password6","this is a single-line note"
"Nom7","siteweb7","Identifiant7","email7#email.email","SecondIdentifiant7","password7","this is a
multi
line note"
"Nom8","siteweb8","Identifiant8","email8#email.email","SecondIdentifiant8","password8","single line note"
I'm trying to print the first column of each row to the console as a start
let rawCsv = CsvFile.Load("path\to\file.csv", ",", '"', false)
for row in rawCsv.Rows do
printfn "value %s" row.[0]
This code gives me the the following error on the for line
Couldn't parse row 2 according to schema: Expected 1 columns, got 5
I haven't give the CsvFile any schema and I couldn't find on the internet how to specify a schema.
I would be able to remove the first line dynamically if I wanted to but it wouldn't change anything since the other lines have different column counts too.
Is there any way to parse this awakward CSV file in F# ?
Note: For each password row, only the column right before the last one matters to me (the password column)
I do not think that CSV file of as irregular structure as yours is a good candidate for processing with CSV Type Provider or CSV Parser.
At the same time it does not seem difficult to parse this file to your likes with few lines of custom logic. The following snippet:
open System
open System.IO
File.ReadAllLines("Sample.csv") // Get data
|> Array.filter(fun x -> x.StartsWith("\"Nom")) // Only lines starting with "Nom may contain password
|> Array.map (fun x -> x.Split(',') |> Array.map (fun x -> x.[1..(x.Length-2)])) // Split each line into "cells"
|> Array.filter(fun x -> x.[x.Length-2] |> String.IsNullOrEmpty |> not) // Take only those having non-empty cell before the last one
|> Array.map (fun x -> x.[0],x.[x.Length-2]) // show the line key and the password
after parsing your sample file produces
>
val it : (string * string) [] =
[|("Nom3", "password3"); ("Nom4", "password4"); ("Nom5", "password5");
("Nom6", "password6"); ("Nom7", "password7"); ("Nom8", "password8")|]
>
It may be a good starting point for further improving the parsing logic to perfection.
I propose to read the csv file as a text file. I read the file line by line and form a list and then parse each line with CsvFile.Parse. But the problem is that the elements are found in Headers and not in Rows which is of type string [] option
open FSharp.Data
open System.IO
let readLines (filePath:string) = seq {
use sr = new StreamReader(filePath)
while not sr.EndOfStream do
yield sr.ReadLine ()
}
[<EntryPoint>]
let main argv =
let lines = readLines "c:\path_to_file\example.csv"
let rows = List.map (fun str -> CsvFile.Parse(str)) (Seq.toList lines)
for row in List.toArray(rows) do
printfn "New Line"
if row.Headers.IsSome then
for r in row.Headers.Value do
printfn "value %s" (r)
printfn "%A" argv
0 // return an integer exit code
I'm trying to log all calls to a function with dbg for debugging (thanks to this answer). Here's the code:
-module(a).
-export([main/0]).
trace_me(_, _, _) ->
ok.
main() ->
dbg:start(),
dbg:tracer(),
dbg:tpl(a, trace_me, 3, []),
dbg:p(all, c),
LargeBinary = binary:copy(<<"foo">>, 10000),
trace_me(foo, bar, LargeBinary).
The problem is that one of the argument is a really large binary, and the code above prints the complete binary with every call:
1> c(a).
{ok,a}
2> a:main().
(<0.57.0>) call a:trace_me(foo,bar,<<"foofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoo...lots of foos omitted...">>)
ok
Is it possible to (without modifying trace_me/3):
Only print the the first 2 arguments for each call?
Print the first 2 arguments + first few bytes of the 3rd argument or just pass the 3rd argument through a custom function before printing?
I don't know with dbg, but you can use the very good and small redbug:
trace_me(_, _, _) ->
ok.
do_redbug() ->
% 1. print arguments as dbg:
% redbug:start("a:trace_me"),
% 2. print arity only:
% redbug:start("a:trace_me", [arity]),
% 3. Specify the formatting depth:
redbug:start("a:trace_me", [{print_depth, 10}]),
LargeBinary = binary:copy(<<"foo">>, 100000),
trace_me(foo, bar, LargeBinary).
Have a look at the (quite terse) redbug documentation in the above link for more options and for how to pass your own printer/formatter function.
Note also that redbug, as opposed to dbg, is safe to use on a production system. Check out this presentation by the redbug author: Taking the printf out of printf Debugging.
I am trying to do mapreduce on Riak with Erlang. I am having data like the following:
Bucket = "Numbers"
{Keys,values} = {Random key,1},{Random key,2}........{Random key,1000}.
Now, I am storing 1000 values from 1 to 1000, where all the keys are autogenerated by the term undefined given as a parameter, so all the keys will have values starting from 1 to 1000.
So I want the data from only the values that are even numbers. Using mapreduce, how can I achieve this?
You would construct phase functions as described in http://docs.basho.com/riak/latest/dev/advanced/mapreduce/
One possible map function:
Mapfun = fun(Object, _KeyData, _Arg) ->
%% get the object value, convert to integer and check if even
Value = list_to_integer(binary_to_term(riak_object:get_value(Object))),
case Value rem 2 of
0 -> [Value];
1 -> []
end
end.
Although you probably want to not completely fail in the event you encounter a sibling:
Mapfun = fun(Object, _KeyData, _Arg) ->
Values = riak_object:get_values(Object),
case length(Values) of %% checking for siblings
1 -> %% only 1 value == no siblings
I = list_to_integer(binary_to_term(hd(Values))),
case I rem 2 of
0 -> [I]; %% value is even
1 -> [] %% value is odd
end;
_ -> [] %% What should happen with siblings?
end
end.
There may also be other cases you need to either prevent or check for: the value containing non-numeric characters, empty value, deleted values(tombsones), just to name a few.
Edit:
A word of caution: Doing a full-bucket MapReduce job will require Riak to read every value from the disk, this could cause extreme latency and timeout on a sizeable data set. Probably not something you want to do in production.
A full example of peforming MapReduce (limited to the numbers 1 to 200 for space considerations):
Assuming that you have cloned and built the riak-erlang-client
Using the second Mapfun from above
erl -pa {path-to-riak-erlang-client}/ebin
Define a reduce function to sort the list
Reducefun = fun(List,_) ->
lists:sort(List)
end.
Attach to the local Riak server
{ok, Pid} = riakc_pb_socket:start_link("127.0.0.1", 8087).
Generate some test data
[ riakc_pb_socket:put(
Pid,
riakc_obj:new(
<<"numbers">>,
list_to_binary("Key" ++ V),V
)
) || V <- [ integer_to_list(Itr) || Itr <- lists:seq(1,200)]],
The function to perform a MapReduce with this client is
mapred(pid(), mapred_inputs(), [mapred_queryterm()])
The mapred_queryterm is a list of phase specification of the form {Type, FunTerm, Arg, Keep} as defined in the readme. For this example, there are 2 phases:
a map phase that selects only even numbers
{map, Mapfun, none, true}
a reduce phase that sorts the result
{reduce, Reducefun, none, true}
Perform the MapReduce query
{ok,Results} = riakc_pb_socket:mapred(
Pid, %% The socket pid from above
<<"numbers">>, %% Input is the bucket
[{map,{qfun,Mapfun},none,true},
{reduce,{qfun,Reducefun},none,true}]
),
Results will be a list of [{_Phase Index_, _Phase Output_}] with a separate entry for each phase for which Keep was true, in this example both phases are marked keep, so in this example Results will be
[{0,[_map phase result_]},{1,[_reduce phase result_]}]
Print out the result of each phase:
[ io:format("MapReduce Result of phase ~p:~n~P~n",[P,Result,500])
|| {P,Result} <- Results ].
When I ran this, my output was:
MapReduce Result of phase 0:
[182,132,174,128,8,146,18,168,70,98,186,118,50,28,22,112,82,160,114,106,12,26,
124,14,194,64,122,144,172,96,126,162,58,170,108,44,90,104,6,196,40,154,94,
120,76,48,150,52,4,62,140,178,2,142,100,166,192,66,16,36,38,88,102,68,34,32,
30,164,110,42,92,138,86,54,152,116,156,72,134,200,148,46,10,176,198,84,56,78,
130,136,74,190,158,24,184,180,80,60,20,188]
MapReduce Result of phase 1:
[2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,
56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,
104,106,108,110,112,114,116,118,120,122,124,126,128,130,132,134,136,138,140,
142,144,146,148,150,152,154,156,158,160,162,164,166,168,170,172,174,176,178,
180,182,184,186,188,190,192,194,196,198,200]
[ok,ok]