clang ASTMatcher without expanded from macro - clang

I am doing a clang ASTMatcher to find the locations where isnan is defined in my source code. I am trying to understand why there are three matches eventhough I have restricted to match only in the main file. Please find a sample source code below:
#include <math.h>
int main()
{
if(isnan(0.0)){
}
}
When I do clang-query match I am getting the below output:
clang-query> match declRefExpr(isExpansionInMainFile())
Match #1:
/home/clang-llvm/code/test.cpp:6:5: note: "root" binds here
if(isnan(0.0)){
^~~~~~~~~~
/usr/include/math.h:299:9: note: expanded from macro 'isnan'
? __isnanf (x) \
^~~~~~~~
Match #2:
/home/clang-llvm/code/test.cpp:6:5: note: "root" binds here
if(isnan(0.0)){
^~~~~~~~~~
/usr/include/math.h:301:9: note: expanded from macro 'isnan'
? __isnan (x) : __isnanl (x))
^~~~~~~
Match #3:
/home/clang-llvm/code/test.cpp:6:5: note: "root" binds here
if(isnan(0.0)){
^~~~~~~~~~
/usr/include/math.h:301:23: note: expanded from macro 'isnan'
? __isnan (x) : __isnanl (x))
^~~~~~~~
3 matches.
Is there anyway to restrict the match only for the source code and not the macro?
I would appreciate any help.

The macro is treated as pure text replacement during preprocessing, which happens before all your matching start. A quick grep into the math.h gives me this:
# define isnan(x) \
(sizeof (x) == sizeof (float) \
? __isnanf (x) \
: sizeof (x) == sizeof (double) \
? __isnan (x) : __isnanl (x))
This explains why you get three matching results. They are already in your main function before you run the AST Matcher.
To get a single location, depending on your source code. In this particular case, you can achieve by changing your node matcher to a conditional operator.
clang-query> match conditionalOperator(hasFalseExpression(conditionalOperator()), isExpansionInMainFile())
Match #1:
~/test.cpp:4:8: note: "root" binds here
if(isnan(0.0)){
^~~~~~~~~~
/usr/include/math.h:254:7: note: expanded from macro 'isnan'
(sizeof (x) == sizeof (float)
\
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 match.
So I am trying to match the expr that after the macro is replaced.
Hope it helps.

Related

Lex: expected expression before ‘[’ token when writing regular expressions

I'm new to lex/yacc and following this tutorial: https://www.youtube.com/watch?v=54bo1qaHAfk
here's my lex file
%{
#include "main.h"
#include <stdio.h>
%}
%%
[a-zA-Z][_a-zA-Z0-9]* {return IDENTIFIER;}
"&" {return RUN_DAEMON;}
"|" {return SYM_PIPE;}
">" {return RED_STDOUT;}
"<" {return RED_STDIN;}
">>" {return APP_STDOUT;}
[ \t\n]+ {;}
. {printf("unexpected character\n");}
%%
int yywrap(){
return 1;
}
however after run lex command I try to compile lex.yy.c with gcc it spam me with this error
sbash.l: In function ‘yylex’:
sbash.l:7:5: error: expected expression before ‘[’ token
[a-zA-Z][_a-zA-Z0-9]* {return IDENTIFIER;}
^
sbash.l:7:6: error: ‘a’ undeclared (first use in this function)
[a-zA-Z][_a-zA-Z0-9]* {return IDENTIFIER;}
^
sbash.l:7:6: note: each undeclared identifier is reported only once for each function it appears in
sbash.l:7:14: error: ‘_a’ undeclared (first use in this function)
[a-zA-Z][_a-zA-Z0-9]* {return IDENTIFIER;}
^~
sbash.l:7:17: error: ‘zA’ undeclared (first use in this function)
[a-zA-Z][_a-zA-Z0-9]* {return IDENTIFIER;}
^~
sbash.l:7:20: error: ‘Z0’ undeclared (first use in this function)
[a-zA-Z][_a-zA-Z0-9]* {return IDENTIFIER;}
^~
sbash.l:7:29: error: expected expression before ‘{’ token
[a-zA-Z][_a-zA-Z0-9]* {return IDENTIFIER;}
^
sbash.l:13:7: error: stray ‘\’ in program
[ \t\n]+ {;}
^
sbash.l:13:9: error: stray ‘\’ in program
[ \t\n]+ {;}
unfortunately I cannot find what's going wrong even googled (many example's expression writes exactly same as my code).
My lex version is 2.6.1 and is on CentOS8
As explained in the Flex manual chapter on flex input file format, pattern rules must start at the left margin:
The rules section of the flex input contains a series of rules of the form:
pattern action
where the pattern must be unindented and the action must begin on the same line. (Some emphasis added)
Indented lines on the rules section are just passed through as-is. In particular, indented lines prior to the first rule are inserted at the top of the yylex function, which is frequently useful. But flex makes no attempt to verify that code included in this way is valid; errors will be detected when the generated scanner is compiled.

It's possible yylval be a struct instead a union?

On Bison, it's possible yylval be a struct instead a union ? I know that i can define yylval as union with %union{} but is there a way to define yylval as struct ? to return the line and the string of a identifier for exemple and access
these information on a action of some gramar rule on bison.
Yes, you can #define YYSTYPE to be any type you want instead of using %union. However, it is rarely useful to do so1 -- if you want source position info, you're much better off using %position in combination with %union.
Its also possible (and common) to use structs within the %union declaration. This makes it easy for some rules to return multiple values (effectively).
1The main problem being that if you use %type to specify the use of one struct field, its painful to use other fields in the same action. You need to do everything manually, thus losing the benefit of bison's union type checking
If you want to keep location information (line number and column number) for your tokens, you can use Bison's location facility, which keeps a location object for each token and non-terminal separately from the semantic value. In an action, you refer to a symbol's location as #n.
The location stack is created and maintained automatically by bison if it sees that you have referred to a location anywhere in a rule.
By default, the location datatype is:
typedef struct YYLTYPE {
int first_line;
int first_column;
int last_line;
int last_column;
} YYLTYPE;
The location information for tokens must be set by the lexer. If you are using the default API, it is stored in the global variable yylloc. The parser will create location information for non-terminals by using the range from the beginning of the first item of a production up to the end of the last item. (For empty productions, a zero-length location object is generated, starting and ending with the start position of the lookahead token.)
Both of these defaults can be overridden if necessary. See the Bison manual for details.
Flex will track line numbers if asked to with %option yylineno, but it does not track column positions, which is a bit annoying. Also, yylloc requires both a starting and an ending line number; yylineno in a flex action will be the line number at the end of the token. Most commonly, you will use the YY_USER_ACTION macro to maintain the value of yylloc; an example implementation (taken from this answer, which you should read if you use this code) is:
%option yylineno
%{
#define YY_USER_ACTION \
yylloc.first_line = yylloc.last_line; \
yylloc.first_column = yylloc.last_column; \
if (yylloc.first_line == yylineno) \
yylloc.last_column += yyleng; \
else { \
int col; \
for (col = 1; yytext[yyleng - col] != '\n'; ++col) {} \
yylloc.last_column = col; \
yylloc.last_line = yylineno; \
}
%}

Unrecognized rule in Flex lexer

In the process of making an XML parser :
As the title suggests I have documented the rules as shown in my code below , but flex seems to miss a specific one.
Error : Cmd Error Img
The line in question is :
{boolean} {yylval.booleanval = strdup(yytext); if(err==1){printf("\t\t\t\t\t\t");}; return BOOLEAN;}```
When clearly declared flex seems to disregard it, where for the other rules no such problem arises.
Flex Code :
%option noyywrap
%option yylineno
string [_a-zA-Z][_a-zA-Z0-9]*
digit [0-9]
integer {digit}+
boolean "True" | "False"
text ({string}| )*
%%
. {printf("%s",yytext);}
{boolean} {yylval.booleanval = strdup(yytext); if(err==1){printf("\t\t\t\t\t\t");}; return BOOLEAN;}
{integer} {return INT;}
{string} {return STRING;}
%%
Rereading the question, I think there is a terminology problem. The rule is
{boolean} {yylval.booleanval = strdup(yytext); if(err==1){printf("\t\t\t\t\t\t");}; return BOOLEAN;}
Like all rule, that rule consists of *pattern" and an action. The pattern {boolean} consists only of a macro expansion. Once the macro is expanded, the line can no longer be recognised as a rule because of stray whitespace in the macro's definition, as I explained in the original answer below:
As indicated by the error message, the problem is the pattern in line 22 of your flex file, which contains a macro expansion of boolean:
boolean "True" | "False"
Flex patterns may not contain unquoted whitespace, whether entered directly or through a macro.
If you insist on using a macro, it could be:
boolean True|False
Although nothing prevents you from inserting the pattern directly in the rule:
True|False {yylval.booleanval = strdup(yytext); if(err==1){printf("\t\t\t\t\t\t");}; return BOOLEAN;}

Jflex Unexpected Character error

I started studying jflex. When i try to generate output using jflex for the following code I keep getting an error
Error in file "\abc.flex" (line 29):
Unexpected character
[ \t\n]+ ;
^
1 error, 0 warnings.
Generation aborted.
Code trying to run
letter [a-zA-Z]
digit [0-9]
intlit [0-9]+
%{
#include <stdio.h>
# define BASTYPTOK 257 /*following are output from yacc*/
# define IDTOK 258 /*yacc assigns token numbers */
# define LITTOK 259
# define CINTOK 260
# define INSTREAMTOK 261
# define COUTTOK 262
# define OUTSTREAMTOK 263
# define WHILETOK 264
# define IFTOK 265
# define ADDOPTOK 266
# define MULOPTOK 267
# define RELOPTOK 268
# define NOTTOK 269
# define STRLITTOK 270
main() /*this replaces the main in the lex library*\
{ int p;
while (p= yylex())
printf("%d is \"%s\"\n", p, yytext);
/*yytext is where lex stores the lexeme*/}
%}
%%
[ \t\n]+ ;
"//".*"\n" ;
{intlit} {return(LITTOK);}
cin {return(CINTOK);}
"<<" {return(INSTREAMTOK);}
\<|"==" {return(RELOPTOK);}
\+|\-|"||" {return(ADDOPTOK);}
"=" {return(yytext[0]);}
"(" {return(yytext[0]);}
")" {return(yytext[0]);}
. {return (yytext[0]); /*default action*/}
%%
Can someone please help me figure out, what is causing the issue.
The pattern is also unindented properly.
thanks for your help.
That's valid flex input but it's not valid jflex. Since the included code is in C rather than Java, it's not clear to me why you would want to use jflex, but if your intent is to port the scanner to Java you might want to read the JFlex manual section on porting.
In particular, the sections in JFlex input are quite different from flex:
flex JFlex
definitions and declarations user code
%% %%
rules declarations
%% %%
user code definitions and rules
So your definitions and rules are in the correct section for a flex file, but not for a JFlex file. (JFlex just copies the first section to the output, so it doesn't recognize the various syntax errors resulting from putting flex declarations where JFlex expects valid user code.)
Also, JFlex definitions are of the form name = pattern rather than name pattern, so once you get the order of the file sorted out, you'll also need to add the equals signs. And. of course, rewrite the C code in Java.

erlang: ei_get_type() : where are the defined constants for the 'type' field?

I am trying to use ei_get_type() (ei) but I am having trouble finding where the 'type' field is documented. I've looked in ei.h but all I could find was a list of constants starting with "ERL_".
#define ERL_SMALL_INTEGER_EXT 'a'
#define ERL_INTEGER_EXT 'b'
#define ERL_FLOAT_EXT 'c'
#define ERL_ATOM_EXT 'd'
#define ERL_REFERENCE_EXT 'e'
#define ERL_NEW_REFERENCE_EXT 'r'
#define ERL_PORT_EXT 'f'
#define ERL_PID_EXT 'g'
#define ERL_SMALL_TUPLE_EXT 'h'
#define ERL_LARGE_TUPLE_EXT 'i'
#define ERL_NIL_EXT 'j'
#define ERL_STRING_EXT 'k'
#define ERL_LIST_EXT 'l'
#define ERL_BINARY_EXT 'm'
#define ERL_SMALL_BIG_EXT 'n'
#define ERL_LARGE_BIG_EXT 'o'
#define ERL_NEW_FUN_EXT 'p'
#define ERL_FUN_EXT 'u'
Is this the correct list? I am unsure because the prototype of er_get_type() has *int ** for the type field whereas the ei.h file defines char the above constants.
NOTE: There are other 'constants' used in the 'erl_interface' package that aren't listed here.
According to the rest of the c codes in Erlang (odbcserver.c, show_msg.c) this is what you should compare the value to.
Apparently these are the byte values used by the external binary format to mark the types of elements, and the get8 macro in putget.h simply returns this value.
I'd been using ei to encode/decode erlang terms from a cnode from a couple of months now, and the constants you mention seems OK. The ones I'm using are:
LONG -> a
ATOM -> d
TUPLE -> h
EMPTY_LIST -> j
STRING -> k
LIST -> l
BINARY -> m
in the kind of messages I've to parse, I only receive these types.

Resources