How to prevent clang-format from moving braces to a different line? - clang-format

In the following snippets I have used the "K & R" style bracing for the "if" body and the "brace under" style for the "else" body. I realize that good practice requires that these or any other bracing styles be consistent, but I have a good reason for wanting to keep them the way they are.
The problem is that clang-format doesn't seem to have an option to not change brace styles to the style that has been selected (LLVM, Google, etc.) Instead, it will move any braces that do not conform to that style to different lines if necessary.
The clang-format documentation states that I can use the comments /* clang-format off / and / clang-format on */ to disable formatting within a delimited range. I believe that's what I've done in the second version of the snippet below, but the braces still get moved to match the selected style anyway. Is there any way to prevent the formatter from formatting specific items?
if (1 < 2) {
;
}
else
{
;
}
if (1 < 2) /* clang-format off */{/* clang-format on */
;
/* clang-format off */}/* clang-format on */
/* clang-format off */else/* clang-format on */
/* clang-format off */{/* clang-format on */
;
}

Related

is there a way to have clang format to require (or not require), parentheses around the argument?

Want clang-format to enforce a C language return keyword style. I'd prefer:
return (foo);
but can live without the parentheses, I just want clang-format to do enforcement. -- Thanks!

flex scanner push-back overflow with automata

I am having a hard time with this problem.
"Write a flex code which recognizes a chain with alphabet {0,1}, with at least 5 char's, and to every consecutive 5 char's there will bee at least 3 1's"
I thought I have solved, but I am new using flex, so I am getting this "flex scanner push-back overflow".
here's my code
%{
#define ACCEPT 1
#define DONT 2
%}
delim [ \t\n\r]
ws {delim}+
comb01 00111|{comb06}1
comb02 01011|{comb07}1
comb03 01101|{comb08}1
comb04 01110|({comb01}|{comb09})0
comb05 01111|({comb01}|{comb09})1
comb06 10011|{comb10}1
comb07 10101|{comb11}1
comb08 10110|({comb02}|{comb12})0
comb09 10111|({comb02}|{comb12})1
comb10 11001|{comb13}1
comb11 11010|({comb03}|{comb14})0
comb12 11011|({comb03}|{comb14})1
comb13 11100|({comb04}|{comb15})0
comb14 11101|({comb04}|{comb15})1
comb15 11110|({comb05}|{comb16})0
comb16 11111|({comb05}|{comb16})1
accept {comb01}|{comb02}|{comb03}|{comb04}|{comb05}|{comb06}|{comb07}|{comb08}|{comb09}|{comb10}|{comb11}|{comb12}|{comb13}|{comb14}|{comb15}|{comb16}
string [^ \t\n\r]+
%%
{ws} { ;}
{accept} {return ACCEPT;}
{string} {return DONT;}
%%
void main () {
int i;
while (i = yylex ())
switch (i) {
case ACCEPT:
printf ("%-20s: ACCEPT\n", yytext);
break;
case DONT:
printf ("%-20s: Reject\n", yytext);
break;
}
}
Flex definitions are macros, and flex implements them that way: when it sees {defn} in a pattern, it replaces it with whatever defn was defined as (in parentheses, usually, to avoid operator precedence issues). It doesn't expand the macros in the macro definition, so the macro substitution might contain more definition references which in turn need to be substituted.
Since macro substitution is unconditional, it is not possible to use recursive macros, including macros which are indirectly recursive. Which yours are. Flex doesn't check for this condition, unlike the C preprocessor; it just continues substituting in an endless loop until it runs out of space.
(Flex is implemented using itself; it does the macro substitution using unput. unput will not resize the input buffer, so "runs out of space" here means that flex's internal flex's input buffer became full of macro substitutions.)
The strategy you are using would work fine as a context-free grammar. But that's not flex. Flex is about regular expressions. The pattern you want to match can be described by a regular expression -- the "grammar" you wrote with flex macros is a regular grammar -- but it is not a regular expression and flex won't make one out of it for you, unfortunately. That's your job.
I don't think it's going to be a very pretty regular expression. In fact, I think it's likely to be enormous. But I didn't try working it out..
There are flex tricks you could use to avoid constructing the regular expression. For example, you could build your state machine out of flex start conditions and then scan one character at a time, where each character scanned does a state transition or throws an error. (Use more() if you want to return the entire string scanned at the end.)

Flex scanning, differentiating between string (with single spaces) and padding (more than one space)

I am having trouble with flex to scan lines that looks something like this
DESCRIPTION This is the device description
I would like the line to be scanned such that DESCRIPTION is one token and "This is the device description" is the other.
I have been playing endlessly with my rules but cannot seem to get it to work.
From the documentation I think I want to implement a rule using
`r/s'
an r but only if it is followed by an s
where spaces are only accepted is they are followed by something that is not a while space. I have no idea how to write this rule with flex's syntax. In my mind the rule should be something like
[a-zA-Z](" "/[a-zA-Z0-9]|[a-zA-Z0-9])* return IDENTIFIER;
But this is invalid.
I can get the lines to chop up each word but I cannot get the rules to differentiate between 1 space and 1 < spaces. Halp.
This is not really a good match for flex, since the recognition of tokens is context-dependent. You can achieve context-dependent scanning using start conditions but excessive use of start conditions is often an indication that some other scanning mechanism would be better.
Regardless of how you do it, the key is figuring out exactly how to decide on the token division. Consider the following four lines, for example:
DEVICE This is the device
MODE This is the mode
DESCRIPTION This is the device description
UNDOCUMENTED FIELD
Of course, it is possible that the corner cases represented by the third and fourth lines never show up in any of your inputs.
If the first token cannot include whitespace, then the problem is relatively simple, although you still need a start condition (and I'm going to assume you read the documentation linked above):
%x WHITE WORDS
%%
/* Possibly should be [[:alpha:]] instead of [[:upper:]] */
[[:upper:]]+ { /* copy yytext */; BEGIN(WHITE); return KEYWORD; }
/* Handle other possible line beginnings */
<WHITE>\n { /* Blank descriptive text */; BEGIN(INITIAL); }
<WHITE>[ \t]+ { BEGIN(WORDS); }
<WHITE>. { /* Something not correct in this line */; ... }
<WORDS>.+ { /* copy yytext */; BEGIN(INITIAL); return DESCRIPTION; }
<WORDS>\n { BEGIN(INITIAL); }
If there might be whitespace in the first token but never two spaces in a row, you could replace the first pattern above with:
[[:alpha:]]+( [[:alpha:]]+)*
which will match any sequence of words (consisting only of letters) where there is exactly one space between successive words. Like the original pattern above, this will end on the first non-alphabetic character found. That error will be detected by the rules in <WHITE>, because any non-whitespace character encountered when that start condition becomes active will be handled by the start condition's default rule (the <WHITE>. rule).
My opinion is that you are using the wrong horse here. lex (flex) should be only used for lexical analysis and yacc (or bison) for syntactic one. Saying that one single character is not a separator but multiple are is not appropriate for a lexer.
My opinion is that lex should only reports words and padding and that yacc should later re-combine words that are not separated by padding elements.
The lex part would be as simple as:
[[:alnum:]_]+ {
// printf("WORD: >%s<\n", yytext); // for debugging
return WORD;
}
[[:blank:]]{2,} {
// printf("PADDING: >%s<\n", yytext);
return PADDING;
}
and the yacc part would contain:
elt: PADDING
| ident
ident: WORD
| ident WORD
action are omitted here because they depend too much on your actual processing.

Valid regular expression for identifier using flex

I'm trying to make a regular expression that will only work when a valid identifier name is given, using flex (the name cannot start with a number). I'm using this code :
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
%}
%%
"if" { printf("IF "); }
[a-zA-Z_][a-zA-Z_0-9]* { printf("%s ", yytext); }
%%
int main() {
yylex();
}
but it is not working. how to make sure that flex accepts only a valid identifier?
When I provide the input:
if
abc
9abc
I see the following output:
IF
abc
9abc
but I expected:
IF
abc
(nothing)
Your patterns do not match all possible inputs.
In such cases, (f)lex adds a default catch-all rule, of the form
.|\n { ECHO; }
In other words, any character not recognized by your patterns will simply be printed on stdout. That will be the case with the newline characters in your input, as well as with the digit 9. After the 9 is recognized by the default rule, the remaining input will again be recognized by your identifier rule.
So you probably wanted something like this:
%option warn nodefault
%%
[[:space:]]+ ; /* Ignore whitespace */
"if" { /* TODO: Handle an "if" token */ }
[[:alpha:]_][[:alnum:]_]* { /* TODO: Handle an identifier token */ }
. { /* TODO: Handle an error */ }
Instead of printing information to stdout in an action as a debugging or learning aid, I strongly suggest you use the -T (or --trace) option when you are building your scanner. That will automatically output debugging information in a consistent and complete manner; it would have told you that the default rule was being matched, for example.
Notes:
%option nodefault tells flex not to insert a default rule. I recommend always using it, because it will keep you out of trouble. The warn option ensures that a warning is issued in this case; I think that warn is default flex behaviour but the manual suggests using it and it cannot hurt.
It's good style to use standard character class expressions. Inside a character class ([…]), [:xxx:] matches anything for which the standard library function isxxx would return true. So [[:space:]]+ matches one or more whitespace characters, including space, tab, and newline (and some others), [[:alpha:]_] matches any letter or an underscore, and [[:alnum:]_]* matches any number (including 0) of letters, digits, or underscores. See the Patterns section of the manual.

Flex regular expression for comments

I'm trying to learn flex and having trouble with a regular expression to catch comments.
Assuming a comment begins with // and runs to the end of the line, I would like the program to recognize the entire comment and set yytext equal to it.
So far ["//".*$] is not cutting the mustard.
Thank you
Putting your text in square brackets creates a character class matching any one character from among those between the brackets. Also, quotation marks are not special in Flex's regex syntax. You want something along these lines:
/* definitions (for more readable rules) */
/* The \134 are octal escapes for the '/' character, for clarity: */
CMNT_START \134\134
%%
/* rules */
{CMNT_START}.*$ /* yytext automatically contains the matched text*/;

Resources