How to match the AST tree by the pattern I want? - clang

I'm practice using clang-query to find the AST I want.
The code below:
while or for loop{
a = b + c;
d = e - f;
foo(a);
if (cond) {
foo(d);
}
}
However, I even cannot match the AST by finding + and - in a single statement.
Could anyone help me?
I search on internet and I cannot find someone has same issue-finding multiple binary operator and statement in one statement as me.

Related

Replacement error calculating replacement in clang

I'm trying to refactor if-else branches and am unable to generate accurate Replacements for it. I try to fetch nodes using:
clang::IfStmt *ifstmt = const_cast<clang::IfStmt *>(
result.Nodes.getNodeAs<clang::IfStmt>("if_simple_else_bind_name"))
and then calculate its source range using
ifstmt->getSourceRange()
However, I find that after using this SourceRange for calculating the start and end for the branch sometimes misses a semicolon for cases like these:
if(cond) {
return a;
}
else
return b; <-- here
How do I find the correct source range and further on generate the correct replacement rewriting the whole branch regardless of whether there are any braces or simple statements?
Any help would be highly appreciated!

fastest way to check for a node being connected to n properties

Right now I have a loop generating my distance, but it seems like their could be a more efficient way
taglst = ['term1', 'term2', 'term3', 'term4', 'term5']
for i, each in enumerate(taglst):
word = 'dist'+str(i)
if i == 0:
q += "MATCH p-[r]-(g {name:\'%s\'}) WITH p, r.dist as %s " % (each, word)
else:
temp = 'dist'+str(i-1)
q += "MATCH p-[r]-(g {name:\'%s\'}) WITH p, r.dist + %s as %s " %(each, temp, word)
q += "RETURN p ORDER by %s ASC" %(word)
Any theories or ideas to help would be great, I'm not really concerned with the python just wanted to demonstrate the workflow for the cypher query.
Yes, there should be an easier way. Try this:
MATCH (p)-[r]-(g)
WITH p, r, collect(g) as gNodes
WHERE ALL (gNode in gNodes WHERE gNode.name in ['term1', 'term2', 'term3'])
RETURN p ORDER BY r.something ASC
You will need to tailor this, but the way it works is by checking that all possible g nodes have a name that's in your list. This saves you repetitive matching the way you've structured it now.
however -- you haven't provided enough information to make this query really good. As I've formulated this, it's going to be expensive because there are no constraints on (p)-[r]-(g). Something that vague is a formula to match your entire graph. Much better would be if you put an initial criteria on r or p to narrow things down drastically.

Matching a string with a specific end in LPeg

I'm trying to capture a string with a combination of a's and b's but always ending with b. In other words:
local patt = S'ab'^0 * P'b'
matching aaab and bbabb but not aaa or bba. The above however does not match anything. Is this because S'ab'^0 is greedy and matches the final b? I think so and can't think of any alternatives except perhaps resorting to lpeg.Cmt which seems like overkill. But maybe not, anyone know how to match such a pattern? I saw this question but the problem with the solution there is that it would stop at the first end marker (i.e. 'cat' there, 'b' here) and in my case I need to accept the middle 'b's.
P.S.
What I'm actually trying to do is match an expression whose outermost rule is a function call.
E.g.
func();
func(x)(y);
func_arr[z]();
all match but
exp;
func()[1];
4 + 5;
do not. The rest of my grammar works and I'm pretty sure this boils down to the same issue but for completeness, the grammar I'm working with looks something like:
top_expr = V'primary_expr' * V'postfix_op'^0 * V'func_call_op' * P';';
postfix_op = V'func_call_op' + V'index_op';
And similarly the V'postfix_op'^0 eats up the func_call_op I'm expecting at the end.
Yes, there is no backtracking, so you've correctly identified the problem. I think the solution is to list the valid postfix_op expressions; I'd change V'func_call_op' + V'index_op' to V'func_call_op'^0 * V'index_op' and also change the final V'func_call_op' to V'func_call_op'^1 to allow several function calls at the end.
Update: as suggested in the comments, the solution to the a/b problem would be (P'b'^0 * P'a')^0 * P'b'^1.
How about this?
local final = P'b' * P(-1)
local patt = (S'ab' - final)^0 * final
The pattern final is what we need at the end of the string.
The pattern patt matches the set 'ab' unless it is followed by the final sequence. Then it asserts that we have the final sequence. That stops the final 'b' from being eaten.
This doesn't guarantee that we get any a's (but neither would the pattern in the question have).
Sorry my answer comes too late but I think it's worth to give this question a more correct answer.
As I understand it, you just want a non-blind greedy match. But unfortunately the "official documentation" of LPeg only tells us how to use LPeg for blind greedy match (or repetition). But this pattern can be described by a parsing expression grammar. For rule S if you want to match as many E1 as you can followed by E2, you need to write
S <- E1 S / E2
The solution to a/b problem becomes
S <- [ab] S / 'b'
You might want to optimize the rule by inserting some a's in the first option
S <- [ab] 'a'* S / 'b'
which will reduce the recursions a lot. As for your real problem, here's my answser:
top_expr <- primary_expr p_and_f ';'
p_and_f <- postfix_op p_and_f / func_call_op
postfix_op <- func_call_op / index_op

Can I Use Multiple Properties in a StructuredFormatDisplayAttribute?

I'm playing around with StructuredFormatDisplay and I assumed I could use multiple properties for the Value, but it seems that is not the case. This question (and accepted answer) talk about customizing in general, but the examples given only use a single property. MSDN is not helpful when it comes to usage of this attribute.
Here's my example:
[<StructuredFormatDisplay("My name is {First} {Last}")>]
type Person = {First:string; Last:string}
If I then try this:
let johnDoe = {First="John"; Last="Doe"}
I end up with this error:
<StructuredFormatDisplay exception: Method 'FSI_0038+Person.First}
{Last' not found.>
The error seems to hint at it only capturing the first property mentioned in my Value but it's hard for me to say that with any confidence.
I have figured out I can work around this by declaring my type like this:
[<StructuredFormatDisplay("My name is {Combined}")>]
type Person = {First:string; Last:string} with
member this.Combined = this.First + " " + this.Last
But I was wondering if anyone could explain why I can't use more than one property, or if you can, what syntax I'm missing.
I did some digging in the source and found this comment:
In this version of F# the only valid values are of the form PreText
{PropertyName} PostText
But I can't find where that limitation is actually implemented, so perhaps someone more familiar with the code base could simply point me to where this limitation is implemented and I'd admit defeat.
The relevant code from the F# repository is in the file sformat.fs, around line 868. Omitting lots of details and some error handling, it looks something like this:
let p1 = txt.IndexOf ("{", StringComparison.Ordinal)
let p2 = txt.LastIndexOf ("}", StringComparison.Ordinal)
if p1 < 0 || p2 < 0 || p1+1 >= p2 then
None
else
let preText = if p1 <= 0 then "" else txt.[0..p1-1]
let postText = if p2+1 >= txt.Length then "" else txt.[p2+1..]
let prop = txt.[p1+1..p2-1]
match catchExn (fun () -> getProperty x prop) with
| Choice2Of2 e ->
Some (wordL ("<StructuredFormatDisplay exception: " + e.Message + ">"))
| Choice1Of2 alternativeObj ->
let alternativeObjL =
match alternativeObj with
| :? string as s -> sepL s
| _ -> sameObjL (depthLim-1) Precedence.BracketIfTuple alternativeObj
countNodes 0 // 0 means we do not count the preText and postText
Some (leftL preText ^^ alternativeObjL ^^ rightL postText)
So, you can easily see that this looks for the first { and the last }, and then picks the text between them. So for foo {A} {B} bar, it extracts the text A} {B.
This does sound like a silly limitation and also one that would not be that hard to improve. So, feel free to open an issue on the F# GitHub page and consider sending a pull request!
Just to put a bow on this, I did submit a PR to add this capability and yesterday it was accepted and pulled into the 4.0 branch.
So starting with F# 4.0, you'll be able to use multiple properties in a StructuredFormatDisplay attribute, with the only downside that all curly braces you wish to use in the message will now need to be escaped by a leading \ (e.g. "I love \{ braces").
I rewrote the offending method to support recursion and switched to using a regular expression to detect property references. It seems to work pretty well, although it isn't the prettiest code I've ever written.

Implementing post / pre increment / decrement when translating to Lua

I'm writing a LSL to Lua translator, and I'm having all sorts of trouble implementing incrementing and decrementing operators. LSL has such things using the usual C like syntax (x++, x--, ++x, --x), but Lua does not. Just to avoid massive amounts of typing, I refer to these sorts of operators as "crements". In the below code, I'll use "..." to represent other parts of the expression.
... x += 1 ...
Wont work, coz Lua only has simple assignment.
... x = x + 1 ...
Wont work coz that's a statement, and Lua can't use statements in expressions. LSL can use crements in expressions.
function preIncrement(x) x = x + 1; return x; end
... preIncrement(x) ...
While it does provide the correct value in the expression, Lua is pass by value for numbers, so the original variable is not changed. If I could get this to actually change the variable, then all is good. Messing with the environment might not be such a good idea, dunno what scope x is. I think I'll investigate that next. The translator could output scope details.
Assuming the above function exists -
... x = preIncrement(x) ...
Wont work for the "it's a statement" reason.
Other solutions start to get really messy.
x = preIncrement(x)
... x ...
Works fine, except when the original LSL code is something like this -
while (doOneThing(x++))
{
doOtherThing(x);
}
Which becomes a whole can of worms. Using tables in the function -
function preIncrement(x) x[1] = x[1] + 1; return x[1]; end
temp = {x}
... preincrement(temp) ...
x = temp[1]
Is even messier, and has the same problems.
Starting to look like I might have to actually analyse the surrounding code instead of just doing simple translations to sort out what the correct way to implement any given crement will be. Anybody got any simple ideas?
I think to really do this properly you're going to have to do some more detailed analysis, and splitting of some expressions into multiple statements, although many can probably be translated pretty straight-forwardly.
Note that at least in C, you can delay post-increments/decrements to the next "sequence point", and put pre-increments/decrements before the previous sequence point; sequence points are only located in a few places: between statements, at "short-circuit operators" (&& and ||), etc. (more info here)
So it's fine to replace x = *y++ + z * f (); with { x = *y + z * f(); y = y + 1; }—the user isn't allowed to assume that y will be incremented before anything else in the statement, only that the value used in *y will be y before it's incremented. Similarly, x = *--y + z * f(); can be replaced with { y = y - 1; x = *y + z * f (); }
Lua is designed to be pretty much impervious to implementations of this sort of thing. It may be done as kind of a compiler/interpreter issue, since the interpreter can know that variables only change when a statement is executed.
There's no way to implement this kind of thing in Lua. Not in the general case. You could do it for global variables by passing a string to the increment function. But obviously it wouldn't work for locals, or for variables that are in a table that is itself global.
Lua doesn't want you to do it; it's best to find a way to work within the restriction. And that means code analysis.
Your proposed solution only will work when your Lua variables are all global. Unless this is something LSL also does, you will get trouble translating LSL programs that use variables called the same way in different places.
Lua is only able of modifying one lvalue per statement - tables being passed to functions are the only exception to this rule. You could use a local table to store all locals, and that would help you out with the pre-...-crements; they can be evaluated before the expression they are contained in is evauated. But the post-...-crements have to be evaluated later on, which is simply not possible in lua - at least not without some ugly code involving anonymous functions.
So you have one options: you must accept that some LSL statements will get translated to several Lua statements.
Say you have a LSL statement with increments like this:
f(integer x) {
integer y = x + x++;
return (y + ++y)
}
You can translate this to a Lua statement like this:
function f(x) {
local post_incremented_x = x + 1 -- extra statement 1 for post increment
local y = x + post_incremented_x
x = post_incremented_x -- extra statement 2 for post increment
local pre_incremented_y = y + 1
return y + pre_incremented_y
y = pre_incremented_y -- this line will never be executed
}
So you basically will have to add two statements per ..-crement used in your statements. For complex structures, that will mean calculating the order in which the expressions are evaluated.
For what is worth, I like with having post decrements and predecrements as individual statements in languages. But I consider it a flaw of the language when they can also be used as expressions. The syntactic sugar quickly becomes semantic diabetes.
After some research and thinking I've come up with an idea that might work.
For globals -
function preIncrement(x)
_G[x] = _G[x] + 1
return _G[x]
end
... preIncrement("x") ...
For locals and function parameters (which are locals to) I know at the time I'm parsing the crement that it is local, I can store four flags to tell me which of the four crements is being used in the variables AST structure. Then when it comes time to output the variables definition, I can output something like this -
local x;
function preIncrement_x() x = x + 1; return x; end
function postDecrement_x() local y = x; x = x - 1; return y; end
... preIncrement_x() ...
In most of your assessment of configurability to the code. You are trying to hard pass data types from one into another. And call it a 'translator'. And in all of this you miss regex and other pattern match capacities. Which are far more present in LUA than LSL. And since the LSL code is being passed into LUA. Try using them, along with other functions. Which would define the work more as a translator, than a hard pass.
Yes I know this was asked a while ago. Though, for other viewers of this topic. Never forget the environments you are working in. EVER. Use what they give you to the best ability you can.

Resources