Using menhir with sedlex - parsing

I need to use menhir with sedlex for whatever reason (utf-8), but don't know how to make the generated parser depend on Sedlexing instead of Lexing. Any tips?
When I run
menhir --infer parser.mly
the generated program has lines with Lexing.... I could change it manually, but there must be another way, no?

EDIT: The parser.ml that is generated should have references to Lexing. Sedlexing is used to create the lexbuf that you send in to the parser, but the parser doesn't care if that lexbuf was created by Lexing or Sedlexing, as long as it can use functions like Lexing.lex_start_p and Lexing.lex_curr_p on it.
I used something like
ocamlbuild -use-menhir -tag thread -use-ocamlfind -quiet -pkg menhirLib \
-pkg sedlex test.native
where test.ml uses parser.mly via calls to Parser.
For completeness, the command that's run by ocamlbuild is:
menhir --ocamlc 'ocamlfind ocamlc -thread -package sedlex -package menhirLib' \
--explain --infer parser.mly
See a full example at https://github.com/unhammer/ocaml_cg_streamparse
(branch https://github.com/unhammer/ocaml_cg_streamparse/tree/match-singlechar-example shows a rule that matches a single code point like a or ß but not aa).

Related

Can Erlang source code be embedded in Elixir code? If so, how?

Elixir source may be injected using Code.eval_string/3. I don't see mention of running raw Erlang code in the docs:
https://hexdocs.pm/elixir/Code.html#eval_string/3
I am coming from a Scala world in which Java objects are callable using Scala syntax, and Scala is compiled into Java and visible by intercepting the compiler output (directly generated with scalac).
I get the sense that Elixir does not provide such interoperating features, nor allow injection of custom Erlang into the runtime. Is this the case?
You can use the erlang standard library modules from Elixir, as described here or here.
For example:
def random_integer(upper) do
:rand.uniform(upper) # rand is an erlang library
end
You can also add erlang packages to your mix.exs dependencies and use them in your project, as long as these packages are published on hex or on github.
You can also use erlang and elixir code together in a project as described here.
So yeah, it's perfectly possible to call erlang code from elixir.
Vice-versa is also possible, see here for more information:
Elixir compiles into BEAM byte code (via Erlang Abstract Format). This
means that Elixir code can be called from Erlang and vice versa,
without the need to write any bindings.
Expanding what #zwippie have written:
All remote function calls (by that I mean calling function with explicitly set module/alias) are in form of:
<atom with module name>.<function name>(<arguments>)
# Technically it is the same as:
# apply(module, function_name_as_atom, [arguments])
And all "upper case module names" in Elixir are just atoms:
is_atom(Foo) == true
Foo == :"Elixir.Foo" # => true
So from Elixir viewpoint there is no difference between calling Erlang functions and Elixir functions. It is just different atom passed as the receiving module.
So you can easily call Erlang modules from Elixir. That mean that without much of the hassle you should be able to compile Erlang AST from within Elixir as well:
"rand:uniform(100)"
|> :merl.quote()
|> :erl_eval.expr(#{})
No need for any mental translation.
Additionally you can without any problems mix Erlang and Elixir code in single Mix project. With tree structure like:
.
|`- mix.exs
|`- src
| `- example.erl
`- lib
`- example.ex
Where example.erl is:
-module(example).
-export([hello/0]).
hello() -> <<"World">>.
And example.ex:
defmodule Example do
def print_hello, do: IO.puts(:example.hello())
end
You can compile project and run it with
mix run -e "Example.print_hello()"
And see that Erlang module was successfully compiled and executed from within Elixir code in the same project without problems.
One more thing to watch for when calling erlang code from elixir. erlang uses charlists for strings. When you call a erlang function that takes a string, convert the string to a charlist and convert returned string to a string.
Examples:
iex(17)> :string.to_upper "test"
** (FunctionClauseError) no function clause matching in :string.to_upper/1
The following arguments were given to :string.to_upper/1:
# 1
"test"
(stdlib 3.15.1) string.erl:2231: :string.to_upper/1
iex(17)> "test" |> String.to_charlist() |> :string.to_upper
'TEST'
iex(18)> "test" |> String.to_charlist() |> :string.to_upper |> to_string
"TEST"
iex(19)>

How to get automake to recognize the non-default filename produced by flex when %option prefix= is set

In general, automake's capability to properly invoke flex and bison when building parsers and scanners is incredibly useful. However, I'm running up against an issue that I can't seem to resolve.
I have a lex file, trigraphs.l, which performs the text replacement of C trigraph sequences. There will be multiple lexers included in the final executable, which will be a pre-preprocessor that prepares the source file for the C preprocessor proper, so a subsequent lexer will take care of joining logical lines that span multiple physical lines, another one will strip out comments, etc. Per the C standard, there is a specific order these transformations are supposed to take place in, so to avoid issues I'm using three separate lexers that will run in the proper sequence.
Anyway, to avoid symbol name conflicts I'm using %option prefix="blah," so my trigraph scanner is as follows:
%option noyywrap
%option prefix="trigraphs_"
%%
"??<" { printf("{"); }
"??>" { printf("}"); }
"??(" { printf("["); }
"??)" { printf("]"); }
"??=" { printf("#"); }
"??/" { printf("\\"); }
"??'" { printf("^"); }
"??!" { printf("|"); }
"??-" { printf("~"); }
. { printf("%c", *yytext); }
%%
The issue seems to be that automake->ylwrap is expecting an output from flex with the traditional lex.yy.c filename, to rename to trigraphs.c. However, because the %option prefix also changes the output filename (to lex.trigraphs_.c), it doesn't get renamed to the trigraphs.c that the Makefile is expecting. This causes the obvious compilation error when the Makefile wants to compile trigraphs.c and it doesn't exist.
One solution that occurred to me is to use %option outfile="lex.yy.c" to sidestep around that. It seems to work so far; however, it feels a bit "hackish" so I wonder if there's a more idiomatic or canonical way to handle this. When searching for solutions others had found, I did come across this question on StackExchange, but it's eight years old now so it's possible that there have been developments in the interim.
There is no more idiomatic way to handle this because Automake rules know nothing about flex. Automake expects to work with project that are compatible with POSIX lex, and expects any lex implementation to follow the POSIX convention of producing a lex.yy.c file.

Get list of #defines as Token or string in Premake 5

I am using a custom build command to run the nasm assembler on a .asm file in my C++ project. I am using %idefs in the assembler code to only compile the code I need. I am checking for the same #defines as in the C++-Code and use define() in Premake 5 to set those, but additionally I need to pass them to nasm on its command line invocation in my Custom Build Command. What I am looking for is a way to concatenate or string replace the Premake internal list of #defines into the command line invocation string of the buildcommands() call. Is there a Premake Token or a way to introspect the lua variables and generate a list from that?
Note that my command line invocation specifically is
buildcommands "nasm.exe -f win32 -o %{cfg.objdir}%{file.basename}.lib %{file.abspath} -DNDEBUG"
Suppose I set defines { "FEAT_A", "FEAT_B" } in my premake5.lua. I then would like to to add -DFEAT_A -DFEAT_B automatically to that build command similar to the -DNDEBUG so I cannot simply insert a simple token. I guess I do have to do something like this (lua pseudo code as I don't really know the syntax):
define_flags = wks.defines.join(" -D")
buildcoommands("nasm.exe [...]"..define_flags)
Do you know if something like this is possible?
How about something like this?
buildcommands('nasm.exe [...] %{table.implode(cfg.defines, "-D", "", " ")} [...]')

Validate URL in Informix 4GL program

In my Informix 4GL program, I have an input field where the user can insert a URL and the feed is later being sent over to the web via a script.
How can I validate the URL at the time of input, to ensure that it's a live link? Can I make a call and see if I get back any errors?
I4GL checking the URL
There is no built-in function to do that (URLs didn't exist when I4GL was invented, amongst other things).
If you can devise a C method to do that, you can arrange to call that method through the C interface. You'll write the method in native C, and then write an I4GL-callable C interface function using the normal rules. When you build the program with I4GL c-code, you'll link the extra C functions too. If you build the program with I4GL-RDS (p-code), you'll need to build a custom runner with the extra function(s) exposed. All of this is standard technique for I4GL.
In general terms, the C interface code you'll need will look vaguely like this:
#include <fglsys.h>
// Standard interface for I4GL-callable C functions
extern int i4gl_validate_url(int nargs);
// Using obsolescent interface functions
int i4gl_validate_url(int nargs)
{
if (nargs != 1)
fgl_fatal(__FILE__, __LINE__, -1318);
char url[4096];
popstring(url, sizeof(url));
int r = validate_url(url); // Your C function
retint(r);
return 1;
}
You can and should check the manuals but that code, using the 'old style' function names, should compile correctly. The code can be called in I4GL like this:
DEFINE url CHAR(256)
DEFINE rc INTEGER
LET url = "http://www.google.com/"
LET rc = i4gl_validate_url(url)
IF rc != 0 THEN
ERROR "Invalid URL"
ELSE
MESSAGE "URL is OK"
END IF
Or along those general lines. Exactly what values you return depends on your decisions about how to return a status from validate_url(). If need so be, you can return multiple values from the interface function (e.g. error number and text of error message). Etc. This is about the simplest possible design for calling some C code to validate a URL from within an I4GL program.
Modern C interface functions
The function names in the interface library were all changed in the mid-00's, though the old names still exist as macros. The old names were:
popstring(char *buffer, int buflen)
retint(int retval)
fgl_fatal(const char *file, int line, int errnum)
You can find the revised documentation at IBM Informix 4GL v7.50.xC3: Publication library in PDF in the 4GL Reference Manual, and you need Appendix C "Using C with IBM Informix 4GL".
The new names start ibm_lib4gl_:
ibm_libi4gl_popMInt()
ibm_libi4gl_popString()
As to the error reporting function, there is one — it exists — but I don't have access to documentation for it any more. It'll be in the fglsys.h header. It takes an error number as one argument; there's the file name and a line number as the other arguments. And it will, presumably, be ibm_lib4gl_… and there'll be probably be Fatal or perhaps fatal (or maybe Err or err) in the rest of the name.
I4GL running a script that checks the URL
Wouldn't it be easier to write a shell script to get the status code? That might work if I can return the status code or any existing results back to the program into a variable? Can I do that?
Quite possibly. If you want the contents of the URL as a string, though, you'll might end up wanting to call C. It is certainly worth thinking about whether calling a shell script from within I4GL is doable. If so, it will be a lot simpler (RUN "script", IIRC, where the literal string would probably be replaced by a built-up string containing the command and the URL). I believe there are file I/O functions in I4GL now, too, so if you can get the script to write a file (trivial), you can read the data from the file without needing custom C. For a long time, you needed custom C to do that.
I just need to validate the URL before storing it into the database. I was thinking about:
#!/bin/bash
read -p "URL to check: " url
if curl --output /dev/null --silent --head --fail "$url"; then
printf '%s\n' "$url exist"
else
printf '%s\n' "$url does not exist"
fi
but I just need the output instead of /dev/null to be into a variable. I believe the only option is to dump the output into a temp file and read from there.
Instead of having I4GL run the code to validate the URL, have I4GL run a script to validate the URL. Use the exit status of the script and dump the output of curl into /dev/null.
FUNCTION check_url(url)
DEFINE url VARCHAR(255)
DEFINE command_line VARCHAR(255)
DEFINE exit_status INTEGER
LET command_line = "check_url ", url
RUN command_line RETURNING exit_status
RETURN exit_status
END FUNCTION {check_url}
Your calling code can analyze exit_status to see whether it worked. A value of 0 indicates success; non-zero indicates a problem of some sort, which can be deemed 'URL does not work'.
Make sure the check_url script (a) exits with status zero on success and non-zero on any sort of failure, and (b) doesn't write anything to standard output (or standard error) by default. The writing to standard error or output will screw up screen layouts, etc, and you do not want that. (You can obviously have options to the script that enable standard output, or you can invoke the script with options to suppress standard output and standard error, or redirect the outputs to /dev/null; however, when used by the I4GL program, it should be silent.)
Your 'script' (check_url) could be as simple as:
#!/bin/bash
exec curl --output /dev/null --silent --head --fail "${1:-http://www.example.com/"
This passes the first argument to curl, or the non-existent example.com URL if no argument is given, and replaces itself with curl, which generates a zero/non-zero exit status as required. You might add 2>/dev/null to the end of the command line to ensure that error messages are not seen. (Note that it will be hell debugging this if anything goes wrong; make sure you've got provision for debugging.)
The exec is a minor optimization; you could omit it with almost no difference in result. (I could devise a scheme that would probably spot the difference; it involves signalling the curl process, though — kill -9 9999 or similar, where the 9999 is the PID of the curl process — and isn't of practical significance.)
Given that the script is just one line of code that invokes another program, it would be possible to embed all that in the I4GL program. However, having an external shell script (or Perl script, or …) has merits of flexibility; you can edit it to log attempts, for example, without changing the I4GL code at all. One more file to distribute, but better flexibility — keep a separate script, even though it could all be embedded in the I4GL.
As Jonathan said "URLs didn't exist when I4GL was invented, amongst other things". What you will find is that the products that have grown to superceed Informix-4gl such as FourJs Genero will cater for new technologies and other things invented after I4GL.
Using FourJs Genero, the code below will do what you are after using the Informix 4gl syntax you are familiar with
IMPORT com
MAIN
-- Should succeed and display 1
DISPLAY validate_url("http://www.google.com")
DISPLAY validate_url("http://www.4js.com/online_documentation/fjs-fgl-manual-html/index.html#c_fgl_nf.html") -- link to some of the features added to I4GL by Genero
-- Should fail and display 0
DISPLAY validate_url("http://www.google.com/testing")
DISPLAY validate_url("http://www.google2.com")
END MAIN
FUNCTION validate_url(url)
DEFINE url STRING
DEFINE req com.HttpRequest
DEFINE resp com.HttpResponse
-- Returns TRUE if http request to a URL returns 200
TRY
LET req = com.HttpRequest.create(url)
CALL req.doRequest()
LET resp = req.getResponse()
IF resp.getStatusCode() = 200 THEN
RETURN TRUE
END IF
-- May want to handle other HTTP status codes
CATCH
-- May want to capture case if not connected to internet etc
END TRY
RETURN FALSE
END FUNCTION

How to distinguish ERTS versions at pre-processing time?

In the recent Erlang R14, inets' file httpd.hrl has been moved from:
src/httpd.hrl
to:
src/http_server/httpd.hrl
The Erlang Web framework includes the file in several places using the following directive:
-include_lib("inets/src/httpd.hrl").
Since I'd love the Erlang Web to compile with both versions of Erlang (R13 and R14), what I'd need ideally is:
-ifdef(OLD_ERTS_VERSION).
-include_lib("inets/src/httpd.hrl").
-else.
-include_lib("inets/src/http_server/httpd.hrl").
-endif.
Even if it possible to retrieve the ERTS version via:
erlang:system_info(version).
That's indeed not possible at pre-processing time.
How to deal with these situations? Is the parse transform the only way to go? Are there better alternatives?
Not sure if you'll like this hackish trick, but you could use a parse transform.
Let's first define a basic parse transform module:
-module(erts_v).
-export([parse_transform/2]).
parse_transform(AST, _Opts) ->
io:format("~p~n", [AST]).
Compile it, then you can include both headers in the module you want this to work for. This should give the following:
-module(test).
-compile({parse_transform, erts_v}).
-include_lib("inets/src/httpd.hrl").
-include_lib("inets/src/http_server/httpd.hrl").
-export([fake_fun/1]).
fake_fun(A) -> A.
If you're on R14B and compile it, you should have the abstract format of the module looking like this:
[{attribute,1,file,{"./test.erl",1}},
{attribute,1,module,test},
{error,{3,epp,{include,lib,"inets/src/httpd.hrl"}}},
{attribute,1,file,
{"/usr/local/lib/erlang/lib/inets-5.5/src/http_server/httpd.hrl",1}},
{attribute,1,file,
{"/usr/local/lib/erlang/lib/kernel-2.14.1/include/file.hrl",1}},
{attribute,24,record,
{file_info,
[{record_field,25,{atom,25,size}},
{record_field,26,{atom,26,type}},
...
What this tells us is that we can use both headers, and the valid one will automatically be included while the other will error out. All we need to do is remove the {error,...} tuple and get a working compilation. To do this, fix the parse_transform module so it looks like this:
-module(erts_v).
-export([parse_transform/2]).
parse_transform(AST, _Opts) ->
walk_ast(AST).
walk_ast([{error,{_,epp,{include,lib,"inets/src/httpd.hrl"}}}|AST]) ->
AST;
walk_ast([{error,{_,epp,{include,lib,"inets/src/http_server/httpd.hrl"}}}|AST]) ->
AST;
walk_ast([H|T]) ->
[H|walk_ast(T)].
This will then remove the error include, only if it's on the precise module you wanted. Other messy includes should fail as usual.
I haven't tested this on all versions, so if the behaviour changed between them, this won't work. On the other hand, if it stayed the same, this parse_transform will be version independent, at the cost of needing to order the compiling order of your modules, which is simple enough with Emakefiles and rebar.
If you are using makefiles, you can do something like
ERTS_VER=$(shell erl +V 2>&1 | egrep -o '[0-9]+.[0-9]+.[0-9]+')
than match the string and define macro in erlc arguments or in Emakefile.
There is no other way, AFAIK.

Resources