Erlang compare two binary file line by line

Erlang compare two binary file line by line - erlang

I tried to compare difference from two binary file, but get confused to compare line by line when reading the file. Like read the first line in the both files, then compare, then read second line of both files to compare
read(B1,B2) ->
{ok, Binary} = file:read_file(B1),
X=[binary_to_list(Bin)||Bin<-binary:split(Binary, [<<"\n">>], [global])],
{ok, Data} = file:read_file(B2),
Y=[binary_to_list(Bin)||Bin<-binary:split(Data, [<<"\n">>], [global])],
compare(X,Y).
compare(X,Y)->
C3=lists:subtract(F1, F2),
io:format("~p~p",[C3,length(C3)]).

You can use lists:zip() to join the two lists into a single list of {Xn, Yn} pairs (check that their lengths are the same first), then lists:foreach() on the result.

You should give more details about what you want to do and what you have tried so far.
I join an example of code that performs the comparison, prints the different lines with the line number, prints the remaining lines if one file is longer.
I do not transform binaries to list, it is unnecessary and inefficient.
-module (comp).
-export ([compare/2]).
compare(F1,F2) ->
{ok,B1} = file:read_file(F1),
{ok,B2} = file:read_file(F2),
io:format("compare file 1: ~s to file 2 ~s~n",[F1,F2]),
compare( binary:split(B1, [<<"\n">>], [global]),
binary:split(B2, [<<"\n">>], [global]),
1).
compare([],[],_) ->
io:format("the 2 files have the same length~n"),
done();
compare([],L,N) ->
io:format("----> file 2 is longer:\n"),
print(L,N);
compare(L,[],N) ->
io:format("----> file 1 is longer:\n"),
print(L,N);
compare([X|T1],[X|T2],N) -> compare(T1,T2,N+1);
compare([X1|T1],[X2|T2],N) ->
io:format("at line ~p,~nfile 1 is: ~s~nfile 2 is: ~s~n",[N,X1,X2]),
compare(T1,T2,N+1).
print([],_) -> done();
print([X|T],N) ->
io:format("line ~p: ~s~n",[N,X]),
print(T,N+1).
done() -> io:format("end of comparison~n").
a small test:
1> c(comp).
{ok,comp}
2> comp:compare("../doc/sample.txt","../doc/sample_mod.txt").
compare file 1: ../doc/sample.txt to file 2 ../doc/sample_mod.txt
at line 9,
file 1 is: Here's an example:
file 2 is: Here's an example (modified):
at line 22,
file 1 is: ```
file 2 is: ```
----> file 2 is longer:
line 23:
line 24: Extra text...
line 25:
end of comparison
ok
3> comp:compare("../doc/sample.txt","../doc/sample.txt").
compare file 1: ../doc/sample.txt to file 2 ../doc/sample.txt
the 2 files have the same length
end of comparison
ok
4>

Related

Why does lists:flatten/2 does not work fine on Erlang script?

I'm trying to implement the a split method in Erlang that is supposed to split a string like "i am on the mountain top" into a list like ["i","am","on","the","mountain","top"].
Here is my code (exercise.erl):
-module(exercise).
-import(oi,[read/1]).
-export([split/4]).
split(Text,_,Result,_) when Text == [] -> Result;
split([Head|Tail],Separator,Result,WordSummer) when Head == Separator ->
split(Tail,Separator,[Result|lists:flatten(WordSummer)],[]);
split([Head|Tail],Separator,Result,WordSummer) ->
split(Tail,Separator,Result,[WordSummer|Head]).
The problem I'm having is that when calling my exported function I get the following error:
9> c(exercise).
{ok,exercise}
10> exercise:split("sdffdgfdg dgdfgfg dgdfg dgdfgd dfgdfgdfgtrty hghfgh",$ ,[],[]).
** exception error: no function clause matching lists:do_flatten(103,[]) (lists.erl, line 627)
in function lists:do_flatten/2 (lists.erl, line 628)
in call from exercise:split/4 (exercise.erl, line 9)
11>
How can I solve this?

Two things:
The [WordSummer|Head] in the last line is creating an improper list because Head is an integer (one character of the input string). This is causing the error you're seeing. You probably meant [WordSummer, Head].
[Result|lists:flatten(WordSummer)] is creating a nested list instead of a list of strings. To append one item to a list, use ++ and wrap the right side in a list: Result ++ [lists:flatten(WordSummer)]
Final code:
split(Text,_,Result,_) when Text == [] -> Result;
split([Head|Tail],Separator,Result,WordSummer) when Head == Separator ->
split(Tail,Separator,Result ++ [lists:flatten(WordSummer)],[]);
split([Head|Tail],Separator,Result,WordSummer) ->
split(Tail,Separator,Result,[WordSummer, Head]).
Test:
1> c(exercise).
{ok,exercise}
2> exercise:split("sdffdgfdg dgdfgfg dgdfg dgdfgd dfgdfgdfgtrty hghfgh",$ ,[],[]).
["sdffdgfdg","dgdfgfg","dgdfg","dgdfgd","dfgdfgdfgtrty"]
There's still a bug where the last segment is being ignored. I'll let you figure that out (hint: you need to consider WordSummer in the first clause of the function).

Parsing awkward CSV file with a dynamic number of columns gives error

I'm a C# developer and this is my first attempt at writing F#.
I'm trying to read a Dashlane exported database in the CSV format. These files have no headers and a dynamic number of columns for each possible type of entry. The following file is an example of dummy data that I use to test my software. It only contains password entries and yet they have between 5 and 7 columns (I'll decide how to handle other types of data later)
The first line of the exported file (in this case, but not always) is the email address that was used to create the dashlane account which makes this line only one column wide.
"accountCreation#email.fr"
"Nom0","siteweb0","Identifiant0","",""
"Nom1","siteweb1","identifiant1","email1#email.email","",""
"Nom2","siteweb2","email2#email.email","",""
"Nom3","siteweb3","Identifiant3","password3",""
"Nom4","siteweb4","Identifiant4","email4#email.email","password4",""
"Nom5","siteweb5","Identifiant5","email5#email.email","SecondIdentifiant5","password5",""
"Nom6","siteweb6","Identifiant6","email6#email.email","SecondIdentifiant6","password6","this is a single-line note"
"Nom7","siteweb7","Identifiant7","email7#email.email","SecondIdentifiant7","password7","this is a
multi
line note"
"Nom8","siteweb8","Identifiant8","email8#email.email","SecondIdentifiant8","password8","single line note"
I'm trying to print the first column of each row to the console as a start
let rawCsv = CsvFile.Load("path\to\file.csv", ",", '"', false)
for row in rawCsv.Rows do
printfn "value %s" row.[0]
This code gives me the the following error on the for line
Couldn't parse row 2 according to schema: Expected 1 columns, got 5
I haven't give the CsvFile any schema and I couldn't find on the internet how to specify a schema.
I would be able to remove the first line dynamically if I wanted to but it wouldn't change anything since the other lines have different column counts too.
Is there any way to parse this awakward CSV file in F# ?
Note: For each password row, only the column right before the last one matters to me (the password column)

I do not think that CSV file of as irregular structure as yours is a good candidate for processing with CSV Type Provider or CSV Parser.
At the same time it does not seem difficult to parse this file to your likes with few lines of custom logic. The following snippet:
open System
open System.IO
File.ReadAllLines("Sample.csv") // Get data
|> Array.filter(fun x -> x.StartsWith("\"Nom")) // Only lines starting with "Nom may contain password
|> Array.map (fun x -> x.Split(',') |> Array.map (fun x -> x.[1..(x.Length-2)])) // Split each line into "cells"
|> Array.filter(fun x -> x.[x.Length-2] |> String.IsNullOrEmpty |> not) // Take only those having non-empty cell before the last one
|> Array.map (fun x -> x.[0],x.[x.Length-2]) // show the line key and the password
after parsing your sample file produces
>
val it : (string * string) [] =
[|("Nom3", "password3"); ("Nom4", "password4"); ("Nom5", "password5");
("Nom6", "password6"); ("Nom7", "password7"); ("Nom8", "password8")|]
>
It may be a good starting point for further improving the parsing logic to perfection.

I propose to read the csv file as a text file. I read the file line by line and form a list and then parse each line with CsvFile.Parse. But the problem is that the elements are found in Headers and not in Rows which is of type string [] option
open FSharp.Data
open System.IO
let readLines (filePath:string) = seq {
use sr = new StreamReader(filePath)
while not sr.EndOfStream do
yield sr.ReadLine ()
}
[<EntryPoint>]
let main argv =
let lines = readLines "c:\path_to_file\example.csv"
let rows = List.map (fun str -> CsvFile.Parse(str)) (Seq.toList lines)
for row in List.toArray(rows) do
printfn "New Line"
if row.Headers.IsSome then
for r in row.Headers.Value do
printfn "value %s" (r)
printfn "%A" argv
0 // return an integer exit code

Erlang Doesn't Warn About Unused Function Argument

If I declare a function
test(A) -> 3.
Erlang generates a warning about variable A not being used. However the definition
isEqual(X,X) -> 1.
Doesn't produce any warning but
isEqual(X,X) -> 1;
isEqual(X,Y) -> 0.
again produces a warning but only for the second line.

The reason why that doesn't generate a warning is because in the second case you are asserting (through pattern matching), by using the same variable name, that the first and second arguments to isEqual/2 have the same value. So you are actually using the value of the argument.
It might help to understand better if we look at the Core Erlang code produced from is_equal/2. You can get .core source files by compiling your .erl file in the following way: erlc +to_core pattern.erl (see here for pattern.erl).
This will produce a pattern.core file that will look something like this (module_info/[0,1] functions removed):
module 'pattern' ['is_equal'/2]
attributes []
'is_equal'/2 = fun (_cor1,_cor0) ->
case <_cor1,_cor0> of
%% Line 5
<X,_cor4> when call 'erlang':'=:=' (_cor4, X) ->
1
%% Line 6
<X,Y> when 'true' ->
0
end
As you can see, each function clause from is_equal/2 in the .erl source code gets translated to a case clause in Core Erlang. X does get used in the first clause since it needs to be compared to the other argument. On the other hand neither X or Y are used in the second clause.

Only log specific arguments in a call trace when tracing a function using `dbg`?

I'm trying to log all calls to a function with dbg for debugging (thanks to this answer). Here's the code:
-module(a).
-export([main/0]).
trace_me(_, _, _) ->
ok.
main() ->
dbg:start(),
dbg:tracer(),
dbg:tpl(a, trace_me, 3, []),
dbg:p(all, c),
LargeBinary = binary:copy(<<"foo">>, 10000),
trace_me(foo, bar, LargeBinary).
The problem is that one of the argument is a really large binary, and the code above prints the complete binary with every call:
1> c(a).
{ok,a}
2> a:main().
(<0.57.0>) call a:trace_me(foo,bar,<<"foofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoo...lots of foos omitted...">>)
ok
Is it possible to (without modifying trace_me/3):
Only print the the first 2 arguments for each call?
Print the first 2 arguments + first few bytes of the 3rd argument or just pass the 3rd argument through a custom function before printing?

I don't know with dbg, but you can use the very good and small redbug:
trace_me(_, _, _) ->
ok.
do_redbug() ->
% 1. print arguments as dbg:
% redbug:start("a:trace_me"),
% 2. print arity only:
% redbug:start("a:trace_me", [arity]),
% 3. Specify the formatting depth:
redbug:start("a:trace_me", [{print_depth, 10}]),
LargeBinary = binary:copy(<<"foo">>, 100000),
trace_me(foo, bar, LargeBinary).
Have a look at the (quite terse) redbug documentation in the above link for more options and for how to pass your own printer/formatter function.
Note also that redbug, as opposed to dbg, is safe to use on a production system. Check out this presentation by the redbug author: Taking the printf out of printf Debugging.

How to do MapReduce on Riak with Erlang to get even values from all the keys storing the numbers from 1 to 1000

I am trying to do mapreduce on Riak with Erlang. I am having data like the following:
Bucket = "Numbers"
{Keys,values} = {Random key,1},{Random key,2}........{Random key,1000}.
Now, I am storing 1000 values from 1 to 1000, where all the keys are autogenerated by the term undefined given as a parameter, so all the keys will have values starting from 1 to 1000.
So I want the data from only the values that are even numbers. Using mapreduce, how can I achieve this?

You would construct phase functions as described in http://docs.basho.com/riak/latest/dev/advanced/mapreduce/
One possible map function:
Mapfun = fun(Object, _KeyData, _Arg) ->
%% get the object value, convert to integer and check if even
Value = list_to_integer(binary_to_term(riak_object:get_value(Object))),
case Value rem 2 of
0 -> [Value];
1 -> []
end
end.
Although you probably want to not completely fail in the event you encounter a sibling:
Mapfun = fun(Object, _KeyData, _Arg) ->
Values = riak_object:get_values(Object),
case length(Values) of %% checking for siblings
1 -> %% only 1 value == no siblings
I = list_to_integer(binary_to_term(hd(Values))),
case I rem 2 of
0 -> [I]; %% value is even
1 -> [] %% value is odd
end;
_ -> [] %% What should happen with siblings?
end
end.
There may also be other cases you need to either prevent or check for: the value containing non-numeric characters, empty value, deleted values(tombsones), just to name a few.
Edit:
A word of caution: Doing a full-bucket MapReduce job will require Riak to read every value from the disk, this could cause extreme latency and timeout on a sizeable data set. Probably not something you want to do in production.
A full example of peforming MapReduce (limited to the numbers 1 to 200 for space considerations):
Assuming that you have cloned and built the riak-erlang-client
Using the second Mapfun from above
erl -pa {path-to-riak-erlang-client}/ebin
Define a reduce function to sort the list
Reducefun = fun(List,_) ->
lists:sort(List)
end.
Attach to the local Riak server
{ok, Pid} = riakc_pb_socket:start_link("127.0.0.1", 8087).
Generate some test data
[ riakc_pb_socket:put(
Pid,
riakc_obj:new(
<<"numbers">>,
list_to_binary("Key" ++ V),V
)
) || V <- [ integer_to_list(Itr) || Itr <- lists:seq(1,200)]],
The function to perform a MapReduce with this client is
mapred(pid(), mapred_inputs(), [mapred_queryterm()])
The mapred_queryterm is a list of phase specification of the form {Type, FunTerm, Arg, Keep} as defined in the readme. For this example, there are 2 phases:
a map phase that selects only even numbers
{map, Mapfun, none, true}
a reduce phase that sorts the result
{reduce, Reducefun, none, true}
Perform the MapReduce query
{ok,Results} = riakc_pb_socket:mapred(
Pid, %% The socket pid from above
<<"numbers">>, %% Input is the bucket
[{map,{qfun,Mapfun},none,true},
{reduce,{qfun,Reducefun},none,true}]
),
Results will be a list of [{_Phase Index_, _Phase Output_}] with a separate entry for each phase for which Keep was true, in this example both phases are marked keep, so in this example Results will be
[{0,[_map phase result_]},{1,[_reduce phase result_]}]
Print out the result of each phase:
[ io:format("MapReduce Result of phase ~p:~n~P~n",[P,Result,500])
|| {P,Result} <- Results ].
When I ran this, my output was:
MapReduce Result of phase 0:
[182,132,174,128,8,146,18,168,70,98,186,118,50,28,22,112,82,160,114,106,12,26,
124,14,194,64,122,144,172,96,126,162,58,170,108,44,90,104,6,196,40,154,94,
120,76,48,150,52,4,62,140,178,2,142,100,166,192,66,16,36,38,88,102,68,34,32,
30,164,110,42,92,138,86,54,152,116,156,72,134,200,148,46,10,176,198,84,56,78,
130,136,74,190,158,24,184,180,80,60,20,188]
MapReduce Result of phase 1:
[2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,
56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,
104,106,108,110,112,114,116,118,120,122,124,126,128,130,132,134,136,138,140,
142,144,146,148,150,152,154,156,158,160,162,164,166,168,170,172,174,176,178,
180,182,184,186,188,190,192,194,196,198,200]
[ok,ok]

Categories

HOME

spring-security

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Erlang compare two binary file line by line - erlang

You can use lists:zip() to join the two lists into a single list of {Xn, Yn} pairs (check that their lengths are the same first), then lists:foreach() on the result.

Related

Why does lists:flatten/2 does not work fine on Erlang script?

Parsing awkward CSV file with a dynamic number of columns gives error

Erlang Doesn't Warn About Unused Function Argument

Only log specific arguments in a call trace when tracing a function using `dbg`?

How to do MapReduce on Riak with Erlang to get even values from all the keys storing the numbers from 1 to 1000

Categories

Resources