I know that lookahead pattern is not allowed in XSLT with saxon, but I need to know how I can use this regular expression in XSLT 2.0:
<xsl:if test='matches($value,"^(?!\s*$).{x,y}$")'>
where x and y are numbers. Any suggestions please?
According to https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html the Java language supports e.g. (?!X) as X, via zero-width negative lookahead so you should be able to enable it with Saxon using the flag ;j', as in matches(string, '(?!subpattern1)subpattern2', ';j') as Saxon allows you (http://saxonica.com/html/documentation/functions/fn/matches.html) to switch to the Java regular expression language using that flag.
Related
I'm learning Compilers Principles recently. I notice all examples from text books describes a language lexcial parser using "lex" or "flex" with regular expressions to show how to analyze input source files.
Does it indicate that, all known programming languages, can be implemented using type 3 grammar to do lexical parsing? Or it's just that text books are using simple samples to show ideas?
Most lexemes in most languages can be identified with regular expressions, but there are exceptions. (When it comes to parsing computer languages, there are always exceptions. Without exception.)
For example, you cannot match a C++ raw string literal with a regex. You cannot tell without syntactic analysis whether /= in a Javacript program is the single lexeme used to indicate divide-and-assign, or whether it is the start of a regular expression which matches a atring starting with =. Languages which allow nested comments (unlike C) require something a bit more powerful.
But it's enormously easier to write a few regexes than to write a full state machine in raw C, so there is a lot of motivation to find ways of bending flex to your will for a few exceptional cases. And flex cooperates to a certain extent by providing features which allow you to escape from the regex straightjacket when necessary. In an advanced class on lexical analysis you might learn more about these features.
I have an existing smt2 file, and used z3 java api parsed this and solved the problem. But I have a question about how to cast a FuncDecl to an Expr because I want to build some simple formula by using z3 java apis. Since my original formulas is purely written in smt2 text file and everything gets parsed into a single BoolExpr. Now, I successfully extracted the consts, and need to manipulate them with new formula. How can i do it? Basically, what I am looking for is how to build an Expr from FuncDecl Or is there a way I can cast it to Expr? Is there any official java api document available? I know there is an example of using z3 java api, but it's pretty painful to look for a specific api description in such a large example.
FuncDecl's are function declarations, they can not be cast directly to expressions, but the function they represent can be applied (to some arguments) to yield an expression. This is what FuncDecl.apply(...) does. Constants are of course a special case, where the function doesn't take any arguments.
Is there a parser generator that also implements the inverse direction, i.e. unparsing domain objects (a.k.a. pretty-printing) from the same grammar specification? As far as I know, ANTLR does not support this.
I have implemented a set of Invertible Parser Combinators in Java and Kotlin. A parser is written pretty much in LL-1 style and it provides a parse- and a print-method where the latter provides the pretty printer.
You can find the project here: https://github.com/searles/parsing
Here is a tutorial: https://github.com/searles/parsing/blob/master/tutorial.md
And here is a parser/pretty printer for mathematical expressions: https://github.com/searles/parsing/blob/master/src/main/java/at/searles/demo/DemoInvert.kt
Take a look at Invertible syntax descriptions: Unifying parsing and pretty printing.
There are several parser generators that include an implementation of an unparser. One of them is the nearley parser generator for context-free grammars.
It is also possible to implement bidirectional transformations of source code using definite clause grammars. In SWI-Prolog, the phrase/2 predicate can convert an input text into a parse tree and vice-versa.
Our DMS Software Reengineering Toolkit does precisely this (and provides a lot of additional support for analyzing/transforming code). It does this by decorating a language grammar with additional attributes, producing what is called an attribute grammar. We use a special DSL to write these rules to make them convenient to write.
It helps to know that DMS produces a tree based directly on the grammar.
Each DMS grammar rule is paired with with so-called "prettyprinting" rule. Each prettyprinting rule describes how to "prettyprint" the syntactic element and sub-elements recognized by its corresponding grammar rule. The prettyprinting process essentially manufactures or combines rectangular boxes of text horizontally or vertically (with optional indentation), with leaves producing unit-height boxes containing the literal value of the leaf (keyword, operator, identifier, constant, etc.
As an example, one might write the following DMS grammar rule and matching prettyprinting rule:
statement = 'for' '(' assignment ';' assignment ';' conditional_expression ')'
'{' sequence_of_statements '}' ;
<<PrettyPrinter>>:
{ V(H('for','(',assignment[1],';','assignment[2],';',conditional_expression,')'),
H('{', I(sequence_of_statements)),
'}');
This will parse the following:
for ( i=x*2;
i--; i>-2*x ) { a[x]+=3;
b[x]=a[x]-1; }
(using additional grammar rules for statements and expressions) and prettyprint it (using additional prettyprinting rules for those additional grammar rules) as follows:
for (i=x*2;i--;i>-2*x)
{ a[x]+=3;
b[x]=a[x]-1;
}
DMS also captures comments, attaches them to AST nodes, and regenerates them on output. The implementation is a bit exotic because most parsers don't handle comments, but utilization is easy, even "free"; comments will be automatically inserted in the prettyprinted result in their original places.
DMS can also print in "fidelity" mode. In this form, it tries to preserve the shape of the toke (e.g., number radix, identifier character capitalization, which keyword spelling was used) the column offset (into the line) of a parsed token. This would cause the original text (or something so close that you don't think it is different) to get regenerated.
More details about what prettyprinters must do are provided in my SO answer on Compiling an AST back to source code. DMS addresses all of those topics cleanly.
This capability has been used by DMS on some 40+ real languages, including full IBM COBOL, PL/SQL, Java 1.8, C# 5.0, C (many dialects) and C++14.
By writing a sufficiently interesting set of prettyprinter rules, you can build things like JavaDoc extended to include hyperlinked source code.
It is not possible in general.
What makes a print pretty? A print is pretty, if spaces, tabs or newlines are at those positions, which make the print looking nicely.
But most grammars ignore white spaces, because in most languages white spaces are not significant. There are exceptions like Python but in general the question, whether it is a good idea to use white spaces as syntax, is still controversial. And therefor most grammars do not use white spaces as syntax.
And if the abstract syntax tree does not contain white spaces, because the parser has thrown them away, no generator can use them to pretty print an AST.
I want to translate boolExpression in Z3 into infix representation. For example, there is a z3 expression (>= t 3), I want to get the infix string "t>=3", is any existing Z3 api to implement it in C# ?
No, the official API does not have support for displaying expressions in infix notation. This functionality can be implemented on top of the API for traversing expressions. The Z3 Python API implements an infix printer. Actually, it implements two: one for Python-like syntax, and one for HTML math-like syntax. The source code of these printers is included in the Z3 distribution. The code is written in python, but can be easily converted into any programming language. The code is located at python\z3printer.py.
Some languages use a unary plus operator for implicit conversions, such as coercing a string to a number (e.g. Javascript) or casting small number types to an int (e.g. most C-based languages), or to be used when overloading operators.
Since the unary plus is primarily used for hackish purposes like this, and also since F# does not perform automatic widening conversions, I was surprised that F# includes the unary plus.
What adds to my surprise is that Haskell does not have a unary plus operator. Since the F# design was influenced by Haskell, I'm curious as to why it was decided that F# needed a unary plus when Haskell apparently didn't.
Can you give an example of a credible use for the unary plus in F#? If you can't, why is it included in the language at all?
I'll summarize the extended comments. Possible reasons (until a more authoritative answer is given):
Consistency with OCaml, from which F# is derived (if you're doing something wrong/unnecessary it's best to keep doing it so people know what to expect :-))
Overloading (mostly for custom types)
Symmetry with unary negation
F# has two core influences:
OCaml, with which it was originally compatible, and
the CLR, on which it is built.
As has been pointed out, OCaml has a unary plus operator, so from that point of view, it was natural for F# to have one as well.
As for the CLR... To my surprise, the Common Language Specification doesn't specify any requirements for languages to support operator overloading. However, it does specify semantics and naming conventions when the mechanism is used. Still, F# was allowed to opt out of using unary plus, just like C# and VB opted out of support for overloading compound assignment operators (+=, etc.).
The most common .NET languages aside from F# (C#, VB and C++/CLI) do allow it and have a unary plus. So from this point of view as well it would be natural for F# to have support for a unary plus operator.
There is a unary plus operator in standard mathematical notation. Most programming languages have standard math notation as the original influence and motivation for the syntax of arithmetic expressions.
According to the this "Used to declare an overload for the unary plus operator."