Ninja build in xText - xtext

I'm trying to define a grammar for ninja build with xtext.
There are three tricky points that I can't answer.
Indentations by tab:
How to handle indentations. A rule in a ninja build file might have several variable definitions with preceding tab spacing (similar to make files). This becomes a problem when the language has SL comments, ignores white-spaces and does indentation by tabs (python, make,...)
cflags = -g
rule cc
command = gcc $cflags -c $in -o $out
Cross referencing reserved set of variable names:
There exists a set of reserved variables. Auto-complete should be able to reference both the reserved and the user defined set of variables.
command = gcc $cflags -c $in -o $out
Autocompleting cross referenced variable names which aren't seperated with WS
org.eclipse.xtext.common.Terminals hides WS tokens. ID tokens are seperated by white spaces. But in ninja script (similar to make files) the parsing should be done with longest matching variable name.
some_var = some_value
command = $some_var.h
Any ideas are appreciated. Thanks.

Check out the Xtext 2.8.0 release: https://www.eclipse.org/Xtext/releasenotes.html
The Whitespace-Aware Languages section states:
Xtext 2.8 supports languages in which whitespace is used to specify
the structure, e.g. using indentation to delimit code blocks as in
Python. This is done through synthetic tokens defined in the grammar:
terminal BEGIN: 'synthetic:BEGIN';
terminal END: 'synthetic:END';
These tokens can be used like other terminals in grammar rules:
WhitespaceAwareBlock:
BEGIN
...
END;
The new example language Home Automation available in the Eclipse examples (File → New → Example → Xtext Examples) demonstrates this concept. It allows code like the following:
Rule 'Report error' when Heater.error then
var String report
do
Thread.sleep(500)
report = HeaterDiagnostic.readError
while (report == null)
println(report)
More details are found in the documentation.

Related

How to type AND in regex word matching

I'm trying to do a word search with regex and wonder how to type AND for multiple criteria.
For example, how to type the following:
(Start with a) AND (Contains p) AND (Ends with e), such as the word apple?
Input
apple
pineapple
avocado
Code
grep -E "regex expression here" input.txt
Desired output
apple
What should the regex expression be?
In general you can't implement and in a regexp (but you can implement then with .*) but you can in a multi-regexp condition using a tool that supports it.
To address the case of ands, you should have made your example starts with a and includes p and includes l and ends with e with input including alpine so it wasn't trivial to express in a regexp by just putting .*s in between characters but is trivial in a multi-regexp condition:
$ cat file
apple
pineapple
avocado
alpine
Using &&s will find both words regardless of the order of p and l as desired:
$ awk '/^a/ && /p/ && /l/ && /e$/' file
apple
alpine
but, as you can see, you can't just use .*s to implement and:
$ grep '^a.*p.*l.*e$' file
apple
If you had to use a single regexp then you'd have to do something like:
$ grep -E '^a.*(p.*l|l.*p).*e$' file
apple
alpine
two ways you can do it
all that "&&" is same as negating the totality of a bunch of OR's "||", so you can write the reverse of what you want.
at a single bit-level, AND is same as multiplication of the bits, which means, instead of doing all the && if u think it's overly verbose, you can directly "multiply" the patterns together :
awk '/^a/ * /p/ * /e$/'
so by multiplying them, you're doing the same as performing multiple logical ANDs all at once
(but only use the short hand if inputs aren't too gigantic, or when savings from early exit are known to be negligible.
don't think of them as merely regex patterns - it's easier for one to think of anything not inside an action block, what's typically referred to as pattern, as
any combination and collection of items that could be evaluated for a boolean outcome of TRUE or FALSE in the end
e.g. POSIX-compliant expressions that work in the space include
sprintf()
field assignments, etc
(even decrementing NR - if there's such a need)
but not
statements like next, print, printf(),
delete array etc, or any of the loop structures
surprisingly though, getline is directly doable
in the pattern space area (with some wrapper workaround)

Is there a way to insert phases between the lexer and parser in ANTLR

I am writing a lexer/parser for a language that allows abbreviations (and globs) for its keywords. And, I am trying to determine the best way to do it.
And one thought that occurs to me, is to insert a phase between the lexer and the parser, where the lexer recognizes the general class, e.g. is this a "command name" or is it an "option" and then passes those general tokens to a second phase which does further analysis and recognizes which command name it is and passes that on as the token type to the parser.
It will make the parser simple. I will only have to deal with well formed command names. Every token will be clear what it means.
It will keep the lexer simple. It will only have to divide things into classes. This is a simple name. This is a glob. This is an option name (starts with a dash).
The phase is the middle will also be relatively simple. The simple name (and option forms) will only have to deal with strings. The glob form can use standard glob techniques to match the glob against the legal candidates, which are in the tables for the simple names and options.
The question is how to insert that phase into ANTLR, so that I call the lexer and it creates tokens and the intermediate phase massages them and then the parser gets the tokens the intermediate phase has categorized.
Is there a known solution for this?
Something like:
lexer grammar simple
letter: [A-Z][a-z];
digit: [0-9];
glob-char: [*?];
name: letter (letter | digit)*;
option: '-'name;
glob: (glob-char|letter)(glob-char|letter|digit)*;
glob-option: '-'glob;
filter grammar name;
end: 'e' | 'end';
generate: 'ge' | 'generate';
goto: 'go' | 'goto';
help: 'h' | 'help';
if: 'i' | 'if';
then: 't' | 'then';
parser grammar simple;
The user (programmer writing the language I am parsing) need to be to write
g*te and have if match generate.
The phase between the lexer and the parser when it sees a glob needs to look at the glob (and the list of keywords) and see if only one of them matches the glob and if so, return that keyword. The stuff I listed in the "filter grammar" is the stuff that builds the list of keywords globs can match. I have found code on the web that matches globs to a list of names. That part isn't hard.
And, I've since found in the ANTLR doc how to run arbitrary code on matching a token and how to change the resulting tokens type. (See my answer.)
It looks like you can use lexerCustomActions to achieve the desired effect. Something like the following.
in your lexer:
GLOB: [-A-Za-z0-9_.]* '*' [-A-Za-z0-9_.*]* { setType(lexGlob(getText())); }
in your Java (or whatever language you are using code):
void int lexGlob(String origText()) {
return xyzzy; // some code that computes the right kind of token type
}

flex default rule can be matched

I am working on a flex parser using flex 2.6.4 with the -s option specified, a particular start condition has the following patterns (I am trying to read everything to the next unescaped newline):
\\(.|\n)
[^\\\n]+
\n
Yet I get the warning: "-s option given but default rule can be matched"
I don't see any holes in the above pattern set, am I missing something or is this a flex error?
Your set of rules does not match a backslash at the end of the file.
Your first rule requires the backslash to be followed by something and the other ones don't match backslashes at all.

How to make the output of Maxima cleaner?

I want to make use of Maxima as the backend to solve some computations used in my LaTeX input file.
I did the following steps.
Step 1
Download and install Maxima.
Step 2
Create a batch file named cas.bat (for example) as follows.
rem cas.bat
echo off
set PATH=%PATH%;"C:\Program Files (x86)\Maxima-5.31.2\bin"
maxima --very-quiet -r %1 > solution.tex
Save the batch in the same directory in which your input file below exists. It is just for the sake of simplicity.
Step 3
Create the input file named main.tex (for example) as follows.
% main.tex
\documentclass[preview,border=12pt,12pt]{standalone}
\usepackage{amsmath}
\def\f(#1){(#1)^2-5*(#1)+6}
\begin{document}
\section{Problem}
Evaluate $\f(x)$ for $x=\frac 1 2$.
\section{Solution}
\immediate\write18{cas "x: 1/2;tex(\f(x));"}
\input{solution}
\end{document}
Step 4
Compile the input file with pdflatex -shell-escape main and you will get a nice output as follows.
!
Step 5
Done.
Questions
Apparently the output of Maxima is as follows. I don't know how to make it cleaner.
solution.tex
1
-
2
$${{15}\over{4}}$$
false
Now, my question are
how to remove such texts?
how to obtain just \frac{15}{4} without $$...$$?
(1) To suppress output, terminate input expressions with dollar sign (i.e. $) instead of semicolon (i.e. ;).
(2) To get just the TeX-ified expression sans the environment delimiters (i.e. $$), call tex1 instead of tex. Note that tex1 returns a string, which you have to print yourself (while tex prints it for you).
Combining these ideas with the stuff you showed, I think your program could look like this:
"x: 1/2$ print(tex1(\f(x)))$"
I think you might find the Maxima mailing list helpful. I'm pretty sure there have been several attempts to create a system such as the one you describe. You can also look at the documentation.
I couldn't find any way to completely clean up Maxima's output within Maxima itself. It always echoes the input line, and always writes some whitespace after the output. The following is an example of a perl script that accomplishes the cleanup.
#!/usr/bin/perl
use strict;
my $var = $ARGV[0];
my $expr = $ARGV[1];
sub do_maxima_to_tex {
my $m = shift;
my $c = "maxima --batch-string='exptdispflag:false; print(tex1($m))\$'";
my $e = `$c`;
my #x = split(/\(%i\d+\)/,$e); # output contains stuff like (%i1)
my $f = pop #x; # remove everything before the echo of the last input
while ($f=~/\A /) {$f=~s/\A .*\n//} # remove echo of input, which may be more than one line
$f =~ s/\\\n//g; # maxima breaks latex tokens in the middle at end of line; fix this
$f =~ s/\n/ /g; # if multiple lines, get it into one line
$f =~ s/\s+\Z//; # get rid of final whitespace
return $f;
}
my $e1 = do_maxima_to_tex("diff($expr,$var,1)");
my $e2 = do_maxima_to_tex("diff($expr,$var,2)");
print <<TEX;
The first derivative is \$$e1\$. Differentiating a second time,
we get \$$e2\$.
TEX
If you name this script a.pl, then doing
a.pl z 3*z^4
outputs this:
The first derivative is $12\,z^3$. Differentiating a second time,
we get $36\,z^2$.
For the OP's application, a script like this one could be what is invoked by the write18 in the latex file.
If you really want to use LaTeX then the maxiplot package is the answer. It provides a maxima environment inside of which you enter Maxima commands. When you process your LaTeX file a Maxima batch file is generated. Process this file with Maxima and process your LaTeX file again to typeset the equations generated by Maxima.
If you would rather have 2D math input with live typesetting then use TeXmacs. It is a cross-platform document authoring environment (a word processor on steroids if you like) that includes plugins for Maxima, Mathematica and many more scientific computing tools. If you need to or are not satisfied with the typesetting, you can export your document to LaTeX.
I know this is a very old post. Excellent answers for the question asked by OP. I was using --very-quiet -r options on the command line for a long time like OP, but in maxima version 5.43.2 they behave differently. See maxima command line v5.43 is behaving differently than v5.41. I am answering this question with a cross reference because when incorporating these answers in your solutions, make sure the changes in behavior of those command line flags are also incorporated.

xtext keywords and terminal rule

I don t understand difference of keyword ('' ->single quotes) that is symbol and terminal symbol (''-> single quotes).Keywords are a kind of terminal rule literals .Please help me
I don't quite understand your question. What difference are you referring to? Keywords are terminal symbols, that are defined inline in production rules compared to terminal rules which offer more syntactic flexibility but cannot be defined inline.
Keywords are a kind of terminal rule literals. The ID rule in org.eclipse.xtext.common.Terminals for instance starts with a keyword:
terminal ID : '^'? .. ;I take this pragraf from Xtext documentation.But we define terminal rule '' .
I do not understand '^' is keyword ? Also why use ""( double quates ) for keywords ?

Resources