File IO and list processing - erlang

I asked a similar question, not sure what wasnt clear about it but I'll try again. I have a file. File name is file.txt, I read file.txt in to a list. Now I can print this to the console and it will show:
blah
blah
blah
blah
That is fine. Perfect :) Now how would I forward that to a new file? so that the new file contains:
blah
blah
blah
blah
Nothing more and nothing less. Here is the code I am using to read a file in to a list:
{ok, Device} = file:open("file.txt", [read]),
Li = readdata(Device, []).
readdata(Device, Accum) ->
case io:get_line(Device, "") of
eof -> file:close(Device), Accum;
Line -> readdata(Device, Accum ++ [Line])
end.
So again, the new file with display EXACTLY what the file I read displays, no extra characters, not all on 1 line..etc.. just the same :)

Well, the easy way is:
ok = file:write_file("output.txt", Li).
As you may see in http://www.erlang.org/doc/man/file.html , there are plenty of useful functions like file:read_file/1 that may shorten your program and at the same time make it a little quicker.
You see, the way you combine read data with accumulator is not perfect because it requires copying of Accum values, so the complexity of your readdata/2 function is N^2. Appending to the head of the list is the best way but of course you'd have to store lines as values of Acc and reverse it in the end.
And what about the length of the file? If it is huge and doesn't fit into memory, you'll have problems using even working with accumulator properly. The standard way in this case is to open both files, read some chunk of data and immediately write it to the output.
copy_file() ->
{ok, In} = file:open("input", [read]),
{ok, Out} = file:open("output", [write]),
copy_file(In, Out),
file:close(In),
file:close(Out).
copy_file(In, Out) ->
case file:read(In, 1024 * 64) of
{ok, Data} ->
ok = file:write(Out, Data),
copy_file(In, Out);
_ ->
ok
end.
I haven't tried the code, it may not compile, I just tried to show the basic idea.

So this is what I came up with. I modified your readdata/2 slightly to optimize the append and remove the newline. The write/2 function uses lists:foreach/2 and io:fwrite/3 to write to the file.
-module(rwlist).
-export([read/1,write/2]).
read(FileName) ->
case file:open(FileName, [read]) of
{ok, Device} ->
readdata(Device, [])
end.
readdata(Device, Accum) ->
case io:get_line(Device, "") of
eof -> file:close(Device), lists:reverse(Accum);
Line -> readdata(Device, [(Line--"\n")|Accum])
end.
write(FileName, List) ->
case file:open(FileName, [write]) of
{ok, Device} ->
lists:foreach(fun(Line) -> writeline(Device, Line) end, List),
file:close(Device)
end.
writeline(Device, Line) -> writeline(Device, Line, os:type()).
writeline(Device, Line, {win32,_}) -> io:fwrite(Device, "~s\r\n", [Line]);
writeline(Device, Line, _) -> io:fwrite(Device, "~s\n", [Line]).
Here's the test...
57> List=rwlist:read("list").
["item 1","item 2","item 3","item 4"]
58> rwlist:write("list2", List).
ok
59> List2=rwlist:read("list2").
["item 1","item 2","item 3","item 4"]
Of course if you are just copying a file Dmitry's answer is better.

Related

How should Erlang filter the elements in the list, and add punctuation and []?

-module(solarSystem).
-export([process_csv/1, is_numeric/1, parseALine/2, parse/1, expandT/1, expandT/2,
parseNames/1]).
parseALine(false, T) ->
T;
parseALine(true, T) ->
T.
parse([Name, Colour, Distance, Angle, AngleVelocity, Radius, "1" | T]) ->
T;%Where T is a list of names of other objects in the solar system
parse([Name, Colour, Distance, Angle, AngleVelocity, Radius | T]) ->
T.
parseNames([H | T]) ->
H.
expandT(T) ->
T.
expandT([], Sep) ->
[];
expandT([H | T], Sep) ->
T.
% https://rosettacode.org/wiki/Determine_if_a_string_is_numeric#Erlang
is_numeric(L) ->
S = trim(L, ""),
Float = (catch erlang:list_to_float(S)),
Int = (catch erlang:list_to_integer(S)),
is_number(Float) orelse is_number(Int).
trim(A) ->
A.
trim([], A) ->
A;
trim([32 | T], A) ->
trim(T, A);
trim([H | T], A) ->
trim(T, A ++ [H]).
process_csv(L) ->
X = parse(L),
expandT(X).
The problem is that it will calls process_csv/1 function in my module in a main, L will be a file like this:
[["name "," col"," dist"," a"," angv"," r "," ..."],["apollo11 ","white"," 0.1"," 0"," 77760"," 0.15"]]
Or like this:
["planets ","earth","venus "]
Or like this:
["a","b"]
I need to display it as follows:
apollo11 =["white", 0.1, 0, 77760, 0.15,[]];
Planets =[earth,venus]
a,b
[[59],[97],[44],[98]]
My problem is that no matter how I make changes, it can only show a part, and there are no symbols. The list cannot be divided, so I can't find a way.
In addition, because Erlang is a niche programming language, I can't even find examples online.
So, can anyone help me? Thank you, very much.
In addition, I am restricted from using recursion.
I think the first problem is that it is hard to link what you are trying to achieve with what your code says thus far. Therefore, this feedback maybe is not exactly what you are looking for, but might give some ideas. Let's structure the problem into the common elements: (1) input, (2) process, and (3) output.
Input
You mentioned that L will be a file, but I assume it is a line in a file, where each line can be one of the 3 (three) samples. In this regard, the samples also do not have consistent pattern.For this, we can build a function to convert each line of the file into Erlang term and pass the result to the next step.
Process
The question also do not mention the specific logic in parsing/processing the input. You also seem to care about the data type so we will convert and display the result accordingly. Erlang as a functional language will naturally be handling list, so on most cases we will need to use functions on lists module
Output
You didn't specifically mention where you want to display the result (an output file, screen/erlang shell, etc), so let's assume you just want to display it in the standard output/erlang shell.
Sample file content test1.txt (please note the dot at the end of each line)
[["name "," col"," dist"," a"," angv"," r "],["apollo11 ","white","0.1"," 0"," 77760"," 0.15"]].
["planets ","earth","venus "].
["a","b"].
Howto run: solarSystem:process_file("/Users/macbook/Documents/test1.txt").
Sample Result:
(dev01#Macbooks-MacBook-Pro-3)3> solarSystem:process_file("/Users/macbook/Documents/test1.txt").
apollo11 = ["white",0.1,0,77760,0.15]
planets = ["earth","venus"]
a = ["b"]
Done processing 3 line(s)
ok
Module code:
-module(solarSystem).
-export([process_file/1]).
-export([process_line/2]).
-export([format_item/1]).
%%This is the main function, input is file full path
%%Howto call: solarSystem:process_file("file_full_path").
process_file(Filename) ->
%%Use file:consult to convert the file content into erlang terms
%%File content is a dot (".") separated line
{StatusOpen, Result} = file:consult(Filename),
case StatusOpen of
ok ->
%%Result is a list and therefore each element must be handled using lists function
Ctr = lists:foldl(fun process_line/2, 0, Result),
io:format("Done processing ~p line(s) ~n", [Ctr]);
_ -> %%This is for the case where file not available
io:format("Error converting file ~p due to '~p' ~n", [Filename, Result])
end.
process_line(Term, CtrIn) ->
%%Assume there are few possibilities of element. There are so many ways to process the data as long as the input pattern is clear.
%%We basically need to identify all possibilities and handle them accordingly.
%%Of course there are smarter (dynamic) ways to handle them, but below may give you some ideas.
case Term of
%%1. This is to handle this pattern -> [["name "," col"," dist"," a"," angv"," r "],["apollo11 ","white"," 0.1"," 0"," 77760"," 0.15"]]
[[_, _, _, _, _, _], [Name | OtherParams]] ->
%%At this point, Name = "apollo11", OtherParamsList = ["white"," 0.1"," 0"," 77760"," 0.15"]
OtherParamsFmt = lists:map(fun format_item/1, OtherParams),
%%Display the result to standard output
io:format("~s = ~p ~n", [string:trim(Name), OtherParamsFmt]);
%%2. This is to handle this pattern -> ["planets ","earth","venus "]
[Name | OtherParams] ->
%%At this point, Name = "planets ", OtherParamsList = ["earth","venus "]
OtherParamsFmt = lists:map(fun format_item/1, OtherParams),
%%Display the result to standard output
io:format("~s = ~p ~n", [string:trim(Name), OtherParamsFmt]);
%%3. Other cases
_ ->
%%Display the warning to standard output
io:format("Unknown pattern ~p ~n", [Term])
end,
CtrIn + 1.
%%This is to format the string accordingly
format_item(Str) ->
StrTrim = string:trim(Str), %%first, trim it
format_as_needed(StrTrim).
format_as_needed(Str) ->
Float = (catch erlang:list_to_float(Str)),
case Float of
{'EXIT', _} -> %%It is not a float -> check if it is an integer
Int = (catch erlang:list_to_integer(Str)),
case Int of
{'EXIT', _} -> %%It is not an integer -> return as is (string)
Str;
_ -> %%It is an int
Int
end;
_ -> %%It is a float
Float
end.

Erlang: syntax error before: ","word"

I have the following functions:
search(DirName, Word) ->
NumberedFiles = list_numbered_files(DirName),
Words = make_filter_mapper(Word),
Index = mapreduce(NumberedFiles, Words, fun remove_duplicates/3),
dict:find(Word, Index).
list_numbered_files(DirName) ->
{ok, Files} = file:list_dir(DirName),
FullFiles = [ filename:join(DirName, File) || File <- Files ],
Indices = lists:seq(1, length(Files)),
lists:zip(Indices, FullFiles). % {Index, FileName} tuples
make_filter_mapper(MatchWord) ->
fun (_Index, FileName, Emit) ->
{ok, [Words]} = file:consult(FileName), %% <---- Line 20
lists:foreach(fun (Word) ->
case MatchWord == Word of
true -> Emit(Word, FileName);
false -> false
end
end, Words)
end.
remove_duplicates(Word, FileNames, Emit) ->
UniqueFiles = sets:to_list(sets:from_list(FileNames)),
lists:foreach(fun (FileName) -> Emit(Word, FileName) end, UniqueFiles).
However, when i call search(Path_to_Dir, Word) I get:
Error in process <0.185.0> with exit value:
{{badmatch,{error,{1,erl_parse,["syntax error before: ","wordinfile"]}}},
[{test,'-make_filter_mapper/1-fun-1-',4,[{file,"test.erl"},{line,20}]}]}
And I do not understand why. Any ideas?
The Words variable will match to content of the list, which might not be only one tuple, but many of them. Try to match {ok, Words} instead of {ok, [Words]}.
Beside the fact that the function file:consult/1 may return a list of several elements so you should replace {ok,[Words]} (expecting a list of one element = Words) by {ok,Words}, it actually returns a syntax error meaning that in the file you are reading, there is a syntax error.
Remember that the file should contain only valid erlang terms, each of them terminated by a dot. The most common error is to forget a dot or replace it by a comma.

Cowboy web server application very slow

I am currently playing around with minimal web servers, like Cowboy. I want to pass a number in the URL, load lines of a file, sort these lines and print the element in the middle to test IO and sorting.
So the code loads the path like /123, makes a padded "00123" out of the number, loads the file "input00123.txt" and sorts its content and then returns something like "input00123.txt 0.50000".
At the sime time I have a test tool which makes 50 simultaneous requests, where only 2 get answered, the rest times out.
My handler looks like the following:
-module(toppage_handler).
-export([init/3]).
-export([handle/2]).
-export([terminate/3]).
init(_Transport, Req, []) ->
{ok, Req, undefined}.
readlines(FileName) ->
{ok, Device} = file:open(FileName, [read]),
get_all_lines(Device, []).
get_all_lines(Device, Accum) ->
case io:get_line(Device, "") of
eof -> file:close(Device), Accum;
Line -> get_all_lines(Device, Accum ++ [Line])
end.
handle(Req, State) ->
{PathBin, _} = cowboy_req:path(Req),
case PathBin of
<<"/">> -> Output = <<"Hello, world!">>;
_ -> PathNum = string:substr(binary_to_list(PathBin),2),
Num = string:right(PathNum, 5, $0),
Filename = string:concat("input",string:concat(Num, ".txt")),
Filepath = string:concat("../data/",Filename),
SortedLines = lists:sort(readlines(Filepath)),
MiddleIndex = erlang:trunc(length(SortedLines)/2),
MiddleElement = lists:nth(MiddleIndex, SortedLines),
Output = iolist_to_binary(io_lib:format("~s\t~s",[Filename,MiddleElement]))
end,
{ok, ReqRes} = cowboy_req:reply(200, [], Output, Req),
{ok, ReqRes, State}.
terminate(_Reason, _Req, _State) ->
ok.
I am running this on Windows to compare it with .NET. Is there anything to make this more performant, like running the sorting/IO in threads or how can I improve it? Running with cygwin didn't change the result a lot, I got about 5-6 requests answered.
Thanks in advance!
The most glaring issue: get_all_lines is O(N^2) because list concatenation (++) is O(N). Erlang list type is a singly linked list. The typical approach here is to use "cons" operator, appending to the head of the list, and reverse accumulator at the end:
get_all_lines(Device, Accum) ->
case io:get_line(Device, "") of
eof -> file:close(Device), lists:reverse(Accum);
Line -> get_all_lines(Device, [Line | Accum])
end.
Pass binary flag to file:open to use binaries instead of strings (which are just lists of characters in Erlang), they are much more memory and CPU-friendly.

Racket ormap in erlang

Is there a better way to implement Racket's ormap in Erlang than:
ormap(_, []) -> false;
ormap(Pred, [H|T]) ->
case Pred(H) of
false -> ormap(Pred, T);
_ -> {ok, Pred(H)}
end.
Looks pretty good to me. I'm not sure how smart Erlang is about optimizing these things, but you might want to actually bind the non-false pattern match to a variable, and avoid recomputing Pred(H).
ormap(_, []) -> false;
ormap(Pred, [H|T]) ->
case Pred(H) of
false -> ormap(Pred, T);
V -> {ok, V}
end.
The Racket version doesn't include the ok symbol, but that seems like the Erlangy thing to do so I don't see anything wrong with it. You might similarly expect Pred to return an attached ok symbol for the non-false case, in which case:
V -> V
or
{ok, V} -> {ok, V}
should work.

How do I create a temp filename in Erlang?

I need to put data in a file since my other function takes a file as input.
How do I create a unique filename in Erlang?
Does something like unix "tempfile" exist?
Do you mean just generate the acutal filename? In that case the safest way would be to use a mix of the numbers you get from now() and the hostname of your computer (if you have several nodes doing the same thing).
Something like:
1> {A,B,C}=now().
{1249,304278,322000}
2> N=node().
nonode#nohost
3> lists:flatten(io_lib:format("~p-~p.~p.~p",[N,A,B,C])).
"nonode#nohost-1249.304278.322000"
4>
You can also use TMP = lib:nonl(os:cmd("mktemp")).
Or you could do
erlang:phash2(make_ref())
for a quick and easy unique indentifier. Unique for up to 2^82 calls which should be enough.for your purposes. I find this easier than formatting a timestamp with node name for use.
Late answer: I just noticed the test_server module which has scratch directory support, worth a look
http://www.erlang.org/doc/man/test_server.html#temp_name-1
I've finally had this problem -- and my user is using a mix of Windows and Linux systems, so the old tried-and-true lib:nonl(os:cmd("mktemp")) method is just not going to cut it anymore.
So here is how I've approached it, both with a mktemp/1 function that returns a filename that can be used and also a mktemp_dir/1 function that returns a directory (after having created it).
-spec mktemp(Prefix) -> Result
when Prefix :: string(),
Result :: {ok, TempFile :: file:filename()}
| {error, Reason :: file:posix()}.
mktemp(Prefix) ->
Rand = integer_to_list(binary:decode_unsigned(crypto:strong_rand_bytes(8)), 36),
TempPath = filename:basedir(user_cache, Prefix),
TempFile = filename:join(TempPath, Rand),
Result1 = filelib:ensure_dir(TempFile),
Result2 = file:write_file(TempFile, <<>>),
case {Result1, Result2} of
{ok, ok} -> {ok, TempFile};
{ok, Error} -> Error;
{Error, _} -> Error
end.
And the directory version:
-spec mktemp_dir(Prefix) -> Result
when Prefix :: string(),
Result :: {ok, TempDir :: file:filename()}
| {error, Reason :: file:posix()}.
mktemp_dir(Prefix) ->
Rand = integer_to_list(binary:decode_unsigned(crypto:strong_rand_bytes(8)), 36),
TempPath = filename:basedir(user_cache, Prefix),
TempDir = filename:join(TempPath, Rand),
Result1 = filelib:ensure_dir(TempDir),
Result2 = file:make_dir(TempDir),
case {Result1, Result2} of
{ok, ok} -> {ok, TempDir};
{ok, Error} -> Error;
{Error, _} -> Error
end.
Both of these do basically the same thing: we get a strongly random name as a binary, convert that to a base36 string, and append it to whatever the OS returns to us as a safe user-local temporary cache location.
On a unix type system, of course, we could just use filename:join(["/tmp", Prefix, Rand]) but the unavailability of /tmp on Windows is sort of the whole point here.
In OTP 24 there is not file:ensure_dir. So I've made something similar:
For directory:
mktemp_dir(Prefix) ->
Rand = integer_to_list(binary:decode_unsigned(crypto:strong_rand_bytes(8)), 36),
TempDir = filename:basedir(user_cache, Prefix),
[]= os:cmd("mkdir " ++ "\"" ++ TempDir ++ "\""),
{ok, _} = file:list_dir(TempDir),
TempDir.
For file:
mktemp(Prefix) ->
Rand = integer_to_list(binary:decode_unsigned(crypto:strong_rand_bytes(8)), 36),
TempDir = filename:basedir(user_cache, Prefix),
TempFile = filename:join(TempDir, Rand),
[]= os:cmd("mkdir " ++ "\"" ++ TempDir ++ "\""),
{ok, _} = file:list_dir(TempDir),
Result = file:write_file(TempFile, <<>>),
case {Result} of
{ok} -> {ok, TempFile};
{Error} -> Error
end.

Resources