I'm trying to create a very simple DSL that takes a string formatted like
GET /endpoint controller.action1 |> controller.action2
And turn it to something along the lines of
{"GET", "/endpoint", [{controller.action1}, {controller.action2}]}
My Leex file is this:
Definitions.
Rules.
GET|PUT|POST|DELETE|PATCH : {token, {method, TokenLine, TokenChars}}.
/[A-Za-z_]+ : {token, {endpoint, TokenLine, TokenChars}}.
[A-Za-z0-9_]+\.[A-Za-z0-9_]+ : {token, {function, TokenLine, splitControllerAction(TokenChars)}}.
\|\> : {token, {pipe, TokenLine}}.
[\s\t\n\r]+ : skip_token.
Erlang code.
splitControllerAction(A) ->
[Controller, Action] = string:tokens(A, "."),
{list_to_atom(Controller), list_to_atom(Action)}.
And my Yecc file looks like this:
Nonterminals route actionlist elem.
Terminals function endpoint method pipe.
Rootsymbol route.
route -> method endpoint actionlist : {$1, $2, $3}.
actionlist -> elem : [$1].
actionlist -> elem 'pipe' actionlist : [$1 | $3].
elem -> function : $1.
Erlang code.
extract_token({_Token, _Line, Value}) -> _Token;
The output I'm getting with this:
2> {ok, Fart, _} = blah:string("GET /asdfdsf dasfadsf.adsfasdf |> adsfsdf.adsfdf").
{ok,[{method,1,"GET"},
{endpoint,1,"/asdfdsf"},
{function,1,{dasfadsf,adsfasdf}},
{pipe,1},
{function,1,{adsfsdf,adsfdf}}],
1}
3> blah_parser:parse(Fart).
{ok,{49,50,51}}
Turns out you need to surround $1 with single-quotes, otherwise it just tries and be the ASCII value.
-Thomas Gebert.
Related
Is it possible to write a function that returns an anonymous function of a specified arity? I'd like to be able to generate a function that can be passed to meck:expect/3 as the third argument so I can dynamically mock existing functions of any arity?
I've done quite a bit of searching and it seems like the only way to solve this is by hardcoding things like this:
gen_fun(1, Function) ->
fun(A) -> Function([A]) end;
gen_fun(2, Function) ->
fun(A, B) -> Function([A, B]) end;
...
It's not pretty, but you can use the same trick as the shell and build your functions from the ground up:
-module(funny).
-export([gen_fun/1, gen_fun/2]).
-spec gen_fun(function()) -> function().
gen_fun(Function) ->
{arity, Arity} = erlang:fun_info(Function, arity),
gen_fun(Arity, Function).
-spec gen_fun(non_neg_integer(), function()) -> function().
gen_fun(Arity, Function) ->
Params = [{var, 1, list_to_atom([$X| integer_to_list(I)])} || I <- lists:seq(1, Arity)],
Anno = erl_anno:new(1),
Expr =
{'fun',
Anno,
{clauses, [{clause, Anno, Params, [], [{call, Anno, {var, Anno, 'Function'}, Params}]}]}},
{value, Fun, _Vars} = erl_eval:expr(Expr, [{'Function', Function}]),
Fun.
Then, in the shellβ¦
1> F = funny:gen_fun(fun io:format/2).
#Fun<erl_eval.43.40011524>
2> F("~ts~n", ["π"]).
π
ok
3> F1 = funny:gen_fun(fun io:format/1).
#Fun<erl_eval.44.40011524>
4> F1("π~n").
π
ok
5>
-module(solarSystem).
-export([process_csv/1, is_numeric/1, parseALine/2, parse/1, expandT/1, expandT/2,
parseNames/1]).
parseALine(false, T) ->
T;
parseALine(true, T) ->
T.
parse([Name, Colour, Distance, Angle, AngleVelocity, Radius, "1" | T]) ->
T;%Where T is a list of names of other objects in the solar system
parse([Name, Colour, Distance, Angle, AngleVelocity, Radius | T]) ->
T.
parseNames([H | T]) ->
H.
expandT(T) ->
T.
expandT([], Sep) ->
[];
expandT([H | T], Sep) ->
T.
% https://rosettacode.org/wiki/Determine_if_a_string_is_numeric#Erlang
is_numeric(L) ->
S = trim(L, ""),
Float = (catch erlang:list_to_float(S)),
Int = (catch erlang:list_to_integer(S)),
is_number(Float) orelse is_number(Int).
trim(A) ->
A.
trim([], A) ->
A;
trim([32 | T], A) ->
trim(T, A);
trim([H | T], A) ->
trim(T, A ++ [H]).
process_csv(L) ->
X = parse(L),
expandT(X).
The problem is that it will calls process_csv/1 function in my module in a main, L will be a file like this:
[["name "," col"," dist"," a"," angv"," r "," ..."],["apollo11 ","white"," 0.1"," 0"," 77760"," 0.15"]]
Or like this:
["planets ","earth","venus "]
Or like this:
["a","b"]
I need to display it as follows:
apollo11 =["white", 0.1, 0, 77760, 0.15,[]];
Planets =[earth,venus]
a,b
[[59],[97],[44],[98]]
My problem is that no matter how I make changes, it can only show a part, and there are no symbols. The list cannot be divided, so I can't find a way.
In addition, because Erlang is a niche programming language, I can't even find examples online.
So, can anyone help me? Thank you, very much.
In addition, I am restricted from using recursion.
I think the first problem is that it is hard to link what you are trying to achieve with what your code says thus far. Therefore, this feedback maybe is not exactly what you are looking for, but might give some ideas. Let's structure the problem into the common elements: (1) input, (2) process, and (3) output.
Input
You mentioned that L will be a file, but I assume it is a line in a file, where each line can be one of the 3 (three) samples. In this regard, the samples also do not have consistent pattern.For this, we can build a function to convert each line of the file into Erlang term and pass the result to the next step.
Process
The question also do not mention the specific logic in parsing/processing the input. You also seem to care about the data type so we will convert and display the result accordingly. Erlang as a functional language will naturally be handling list, so on most cases we will need to use functions on lists module
Output
You didn't specifically mention where you want to display the result (an output file, screen/erlang shell, etc), so let's assume you just want to display it in the standard output/erlang shell.
Sample file content test1.txt (please note the dot at the end of each line)
[["name "," col"," dist"," a"," angv"," r "],["apollo11 ","white","0.1"," 0"," 77760"," 0.15"]].
["planets ","earth","venus "].
["a","b"].
Howto run: solarSystem:process_file("/Users/macbook/Documents/test1.txt").
Sample Result:
(dev01#Macbooks-MacBook-Pro-3)3> solarSystem:process_file("/Users/macbook/Documents/test1.txt").
apollo11 = ["white",0.1,0,77760,0.15]
planets = ["earth","venus"]
a = ["b"]
Done processing 3 line(s)
ok
Module code:
-module(solarSystem).
-export([process_file/1]).
-export([process_line/2]).
-export([format_item/1]).
%%This is the main function, input is file full path
%%Howto call: solarSystem:process_file("file_full_path").
process_file(Filename) ->
%%Use file:consult to convert the file content into erlang terms
%%File content is a dot (".") separated line
{StatusOpen, Result} = file:consult(Filename),
case StatusOpen of
ok ->
%%Result is a list and therefore each element must be handled using lists function
Ctr = lists:foldl(fun process_line/2, 0, Result),
io:format("Done processing ~p line(s) ~n", [Ctr]);
_ -> %%This is for the case where file not available
io:format("Error converting file ~p due to '~p' ~n", [Filename, Result])
end.
process_line(Term, CtrIn) ->
%%Assume there are few possibilities of element. There are so many ways to process the data as long as the input pattern is clear.
%%We basically need to identify all possibilities and handle them accordingly.
%%Of course there are smarter (dynamic) ways to handle them, but below may give you some ideas.
case Term of
%%1. This is to handle this pattern -> [["name "," col"," dist"," a"," angv"," r "],["apollo11 ","white"," 0.1"," 0"," 77760"," 0.15"]]
[[_, _, _, _, _, _], [Name | OtherParams]] ->
%%At this point, Name = "apollo11", OtherParamsList = ["white"," 0.1"," 0"," 77760"," 0.15"]
OtherParamsFmt = lists:map(fun format_item/1, OtherParams),
%%Display the result to standard output
io:format("~s = ~p ~n", [string:trim(Name), OtherParamsFmt]);
%%2. This is to handle this pattern -> ["planets ","earth","venus "]
[Name | OtherParams] ->
%%At this point, Name = "planets ", OtherParamsList = ["earth","venus "]
OtherParamsFmt = lists:map(fun format_item/1, OtherParams),
%%Display the result to standard output
io:format("~s = ~p ~n", [string:trim(Name), OtherParamsFmt]);
%%3. Other cases
_ ->
%%Display the warning to standard output
io:format("Unknown pattern ~p ~n", [Term])
end,
CtrIn + 1.
%%This is to format the string accordingly
format_item(Str) ->
StrTrim = string:trim(Str), %%first, trim it
format_as_needed(StrTrim).
format_as_needed(Str) ->
Float = (catch erlang:list_to_float(Str)),
case Float of
{'EXIT', _} -> %%It is not a float -> check if it is an integer
Int = (catch erlang:list_to_integer(Str)),
case Int of
{'EXIT', _} -> %%It is not an integer -> return as is (string)
Str;
_ -> %%It is an int
Int
end;
_ -> %%It is a float
Float
end.
I'm using yecc to parse my tokenized asm-like code. After providing code like "MOV [1], [2]\nJMP hello" and after lexer'ing, this is what I'm getting in response.
[{:opcode, 1, :MOV}, {:register, 1, 1}, {:",", 1}, {:register, 1, 2},
{:opcode, 2, :JMP}, {:identifer, 2, :hello}]
When I parse this I'm getting
[%{operation: [:MOV, [:REGISTER, 1], [:REGISTER, 2]]},
%{operation: [:JMP, [:CONST, :hello]]}]
But I want every operation to have line number in order to get meaningful errors further in code.
So I changed my parser to this:
Nonterminals
code statement operation value.
Terminals
label identifer integer ',' opcode register address address_in_register line_number.
Rootsymbol code.
code -> line_number statement : [{get_line('$1'), '$2'}].
code -> line_number statement code : [{get_line('$1'), '$2'} | '$3'].
%code -> statement : ['$1'].
%code -> statement code : ['$1' | '$2'].
statement -> label : #{'label' => label('$1')}.
statement -> operation : #{'operation' => '$1'}.
operation -> opcode value ',' value : [operation('$1'), '$2', '$4'].
operation -> opcode value : [operation('$1'), '$2'].
operation -> opcode identifer : [operation('$1'), value('$2')].
operation -> opcode : [operation('$1')].
value -> integer : value('$1').
value -> register : value('$1').
value -> address : value('$1').
value -> address_in_register : value('$1').
Erlang code.
get_line({_, Line, _}) -> Line.
operation({opcode, _, OpcodeName}) -> OpcodeName.
label({label, _, Value}) -> Value.
value({identifer, _, Value}) -> ['CONST', Value];
value({integer, _, Value}) -> ['CONST', Value];
value({register, _, Value}) -> ['REGISTER', Value];
value({address, _, Value}) -> ['ADDRESS', Value];
value({address_in_register, _, Value}) -> ['ADDRESS_IN_REGISTER', Value].
(commented code is old, working rule)
Now I'm getting
{:error, {1, :assembler_parser, ['syntax error before: ', ['\'MOV\'']]}}
After providing same input. How to fix this?
My suggestion is to keep the line numbers in the tokens and not as separate tokens and then change how you build the operations.
So I would suggest this:
operation -> opcode value ',' value : [operation('$1'), line('$1'), '$2', '$4'].
operation -> opcode value : [operation('$1'), line('$1'), '$2'].
operation -> opcode identifer : [operation('$1'), line('$1'), value('$2')].
operation -> opcode : [operation('$1'), line('$1')].
line({_, Line, _}) -> Line.
Or even this if you want to mirror Elixir AST:
operation -> opcode value ',' value : {operation('$1'), meta('$1'), ['$2', '$4']}.
operation -> opcode value : {operation('$1'), meta('$1'), ['$2']}.
operation -> opcode identifer : {operation('$1'), meta('$1'), [value('$2')]}.
operation -> opcode : {operation('$1'), meta('$1'), []}.
meta({_, Line, _}) -> [{line, Line}].
I have the following functions:
search(DirName, Word) ->
NumberedFiles = list_numbered_files(DirName),
Words = make_filter_mapper(Word),
Index = mapreduce(NumberedFiles, Words, fun remove_duplicates/3),
dict:find(Word, Index).
list_numbered_files(DirName) ->
{ok, Files} = file:list_dir(DirName),
FullFiles = [ filename:join(DirName, File) || File <- Files ],
Indices = lists:seq(1, length(Files)),
lists:zip(Indices, FullFiles). % {Index, FileName} tuples
make_filter_mapper(MatchWord) ->
fun (_Index, FileName, Emit) ->
{ok, [Words]} = file:consult(FileName), %% <---- Line 20
lists:foreach(fun (Word) ->
case MatchWord == Word of
true -> Emit(Word, FileName);
false -> false
end
end, Words)
end.
remove_duplicates(Word, FileNames, Emit) ->
UniqueFiles = sets:to_list(sets:from_list(FileNames)),
lists:foreach(fun (FileName) -> Emit(Word, FileName) end, UniqueFiles).
However, when i call search(Path_to_Dir, Word) I get:
Error in process <0.185.0> with exit value:
{{badmatch,{error,{1,erl_parse,["syntax error before: ","wordinfile"]}}},
[{test,'-make_filter_mapper/1-fun-1-',4,[{file,"test.erl"},{line,20}]}]}
And I do not understand why. Any ideas?
The Words variable will match to content of the list, which might not be only one tuple, but many of them. Try to match {ok, Words} instead of {ok, [Words]}.
Beside the fact that the function file:consult/1 may return a list of several elements so you should replace {ok,[Words]} (expecting a list of one element = Words) by {ok,Words}, it actually returns a syntax error meaning that in the file you are reading, there is a syntax error.
Remember that the file should contain only valid erlang terms, each of them terminated by a dot. The most common error is to forget a dot or replace it by a comma.
In my grammar I have something like this:
line : startWord (matchPhrase|
anyWord matchPhrase|
anyWord anyWord matchPhrase|
anyWord anyWord anyWord matchPhrase|
anyWord anyWord anyWord anyWord matchPhrase)
-> ^(TreeParent startWord anyWord* matchPhrase);
So I want to match the first occurrence of matchPhrase, but I will allow up to a certain number of anyWord before it. The tokens that make up matchPhrase are also matched by anyWord.
Is there a better way of doing this?
I think it might be possible by combining the semantic predicate in this answer with the non-greedy option:
(options {greedy=false;} : anyWord)*
but I can't figure out exactly how to do this.
Edit: Here's an example. I want to extract information from the following sentences:
Picture of a red flower.
Picture of the following: A red flower.
My input is actually tagged English sentences, and the Lexer rules match the tags rather than the words. So the input to ANTLR is:
NN-PICTURE Picture IN-OF of DT a JJ-COLOR red NN-FLOWER flower
NN-PICTURE Picture IN-OF of DT the VBG following COLON : DT a JJ-COLOR red NN-FLOWER flower
I have lexer rules for each tag like this:
WS : (' ')+ {skip();};
TOKEN : (~' ')+;
nnpicture:'NN-PICTURE' TOKEN -> ^('NN-PICTURE' TOKEN);
vbg:'VBG' TOKEN -> ^('VBG' TOKEN);
And my parser rules are something like this:
sentence : nnpicture inof matchFlower;
matchFlower : (dtTHE|dt)? jjcolor? nnflower;
But of course this will fail on the second sentence. So I want to allow a bit of flexibility by allowing up to N tokens before the flower match. I have an anyWord token that matches anything, and the following works:
sentence : nnpicture inof ( matchFlower |
anyWord matchFlower |
anyWord anyWord matchFlower | etc.
but it isn't very elegant, and doesn't work well with large N.
You can do that by first checking inside the matchFlower rule if there really is dt? jjcolor? nnflower ahead in its token-stream using a syntactic predicate. If such tokens can be seen, simply match them, if not, match any token, and recursively match matchFlower. This would look like:
matchFlower
: (dt? jjcolor? nnflower)=> dt? jjcolor? nnflower -> ^(FLOWER dt? jjcolor? nnflower)
| . matchFlower -> matchFlower
;
Note that the . (dot) inside a parser rule does not match any character, but any token.
Here's a quick demo:
grammar T;
options {
output=AST;
}
tokens {
TEXT;
SENTENCE;
FLOWER;
}
parse
: sentence+ EOF -> ^(TEXT sentence+)
;
sentence
: nnpicture inof matchFlower -> ^(SENTENCE nnpicture inof matchFlower)
;
nnpicture
: NN_PICTURE TOKEN -> ^(NN_PICTURE TOKEN)
;
matchFlower
: (dt? jjcolor? nnflower)=> dt? jjcolor? nnflower -> ^(FLOWER dt? jjcolor? nnflower)
| . matchFlower -> matchFlower
;
inof
: IN_OF (t=IN | t=OF) -> ^(IN_OF $t)
;
dt
: DT (t=THE | t=A) -> ^(DT $t)
;
jjcolor
: JJ_COLOR TOKEN -> ^(JJ_COLOR TOKEN)
;
nnflower
: NN_FLOWER TOKEN -> ^(NN_FLOWER TOKEN)
;
IN_OF : 'IN-OF';
NN_FLOWER : 'NN-FLOWER';
DT : 'DT';
A : 'a';
THE : 'the';
IN : 'in';
OF : 'of';
VBG : 'VBG';
NN_PICTURE : 'NN-PICTURE';
JJ_COLOR : 'JJ-COLOR';
TOKEN : ~' '+;
WS : ' '+ {skip();};
A parser generated from the grammar above would parse your input:
NN-PICTURE Picture IN-OF of DT the VBG following COLON : DT a JJ-COLOR red NN-FLOWER flower
as follows:
As you can see, everything before the flower is omitted from the tree. If you want to keep these tokens in there, do something like this:
grammar T;
// ...
tokens {
// ...
NOISE;
}
// ...
matchFlower
: (dt? jjcolor? nnflower)=> dt? jjcolor? nnflower -> ^(FLOWER dt? jjcolor? nnflower)
| t=. matchFlower -> ^(NOISE $t) matchFlower
;
// ...
resulting in the following AST: