Is there an efficient algorithm for finding broken parenthesis in a block of text for the purpose of intellisense error highlighting?
For instance:
function f() {
var a = [1, 2, 3];
if ((a[1] < 1) || (a[0] > 2)) {
console.log((a + 5).toString());
}
}
Where any (, ), [, ], {, or } character might be dropped or adding in correctly and the correct issue might be highlighted, for instance spotting the specific statement, function, conditional, etc level item causing the issue?
The algorithm is not difficult:
Have a stack of characters
For each character in the code:
If it's an opening bracket, push it onto the stack
If it's a closing bracket, pop one char from the stack, both must match
At the end the stack must be empty
Then you could maybe highlight the unmatched bracket(s).
I think that one way of approaching your problem is to validate the matching bracket groups. This may be achieved using the regular expression - see: http://blog.stevenlevithan.com/archives/javascript-match-nested of Steven Levithan.
Related
I'm trying to match the prefix of the string Something. For example, If input So,SOM,SomeTH,some,S, it is all accepted because they are all prefixes of Something.
My code
Ss[oO]|Ss[omOMOmoM] {
printf("Accept Something": %s\n", yytext);
}
Input
Som
Output
Accept Something: So
Invalid Character
It's suppose to read Som because it is a prefix of Something. I don't get why my code doesn't work. Can anyone correct me on what I am doing wrong?
I don't know what you think the meaning of
Ss[oO]|Ss[omOMOmoM]
is, but what it matches is either:
an S followed by an s followed by exactly one of the letters o or O, or
an S followed by an s followed by exactly one of the letters o, O, m or M. Putting a symbol more than once inside a bracket expression has no effect.
Also, I don't see how that could produce the output you report. Perhaps there was a copy-and-paste error, or perhsps you have other pattern rules.
If you want to match prefixes, use nested optional matches:
s(o(m(e(t(h(i(ng?)?)?)?)?)?)?)?
If you want case-insensitive matcges, you could write out all the character classes, but that gets tiriesome; simpler is to use a case-insensitve flag:
(?i:s(o(m(e(t(h(i(ng?)?)?)?)?)?)?)?)
(?i: turns on the insensitive flag, until the matching close parenthesis.
In practice, this is probably not what you want. Normally, you will want to recognise a complete word as a token. You could then check to see if the word is a prefix in the rule action:
[[:alpha:]]+ { if (yyleng <= strlen("something") && 0 == strncasemp(yytext, "something", yyleng) {
/* do something */
}
}
There is lots of information in the Flex manual.
Right now your code (as shown) should only match "Sso" or "SsO" or "Ssm" or "SsM".
You have two alternatives that each start with Ss (without square brackets) so those will be matched literally. That's followed by either [oO] or [omOMomoM], but the characters in square brackets represent alternatives, so that's equivalent to [oOmM] --i.e., any one character of of o, O, m or M.
I'd start with: %option caseless to make it a case-insensitive scanner, so you don't have to list the upper- and lower-case equivalents of every letter.
Then it's probably easiest to just list the alternatives literally:
s|so|som|some|somet|someth|somethi|somethin|something { printf("found prefix"); }
I guess you can make the pattern a bit shorter (at least in the source code) by doing something on this order:
s(o(m(e(t(h(i(n(n(g)?)?)?)?)?)?)?)?)? { printf("found prefix"); }
Doesn't seem like a huge improvement to me, but some might find it more attractive than I do.
If you don't want to use %option caseless the basic idea helps more:
[sS]([oO]([mM]([eE]([tT]([hH]([iI]([nN]([gG])?)?)?)?)?)?)?)? { printf("found prefix"); }
Listing every possible combination of upper and lower case would get tedious.
I need to check wether matching parenthesis is present in a string that might have emoticons (like :) or :(). For example, "(:)())()", "(abcd)()ghijk)((mnop)qert)"
I have used the patterns "^[:\\(|:\\)]" to check for emoticons and "\\([^()]*\\)" to check for matching parenthesis present, but they are not detected. How can I do this?
The really simple solution to this problem is to count the parentheses, trying to solve it with regular expressions is hard though extended regular expressions can handle it. Here is a sketch of the simple algorithm:
Set openParenthesisCount to 0
Iterate over the string:
If current character is ( increment openParenthesisCount
If current character is ) decrement openParenthesisCount, if count goes negative then fail (too many closing)
If current character is : lookahead and skip next character if it is a parenthesis (skip smilies)
If openParenthesisCount is zero => succeed
HTH
As far as I can tell, you want to match a string if and only if it contains matching parentheses, after ignoring every occurrence of ":)" and ":(" in the string, if any.
So, try this:
^((?!:).)*\(.*(?<!:)\).*
It will match the following strings:
()
(abd)
(())(
(:))
(:(:))
(:)())()
(abcd)()ghijk)((mnop)qert)
(abc):
(:abc)
But will NOT match the following:
)(
(:)
(:(
:(:)
:()
(:)
:()(:)
(
)
(abc
abc)
(abc:)
:(abc)
I have the following code which enables me to make console output appear on the same line. However, if a value that was previously printed was of greater length than values after it, the remnants of the longer value will show up. I have seen other questions about the same thing in languages like Python, but I'm not sure how to overcome this in Rust.
Here's an example:
use std::io::prelude::*;
fn main() {
let fruits = ["Blueberry", "Orange", "Cherry", "Lemon", "Apple"];
print_value(&fruits);
}
fn print_value(e: &[&str]) {
for val in e {
print!("\rStatus: {}", val);
std::io::stdout().flush().unwrap();
// pause program temporarily
std::thread::sleep(std::time::Duration::new(2, 0));
}
}
Some terminals have a special character sequence that, when printed, clears the line to the right of the current cursor position.
VT100-compatible terminals have a character sequence EL0 for that. In Rust it can be expressed with "\x1B[K".
Here's a little thingy that might prove an example.
To do that in a more portable way you use a terminal library, such as term and it's delete_line method.
I've got a file with syntactically correct Lua 5.1 source code.
I've got a position (line and character offset) inside that file.
I need to get an offset in bytes to the closing parenthesis of the innermost function() body that contains that position (or figure out that the position belongs to the main chunk of the file).
I.e.:
local function foo()
^ result
print("bar")
^ input
end
local foo = function()
^ result
print("bar")
^ input
end
local foo = function()
return function()
^ result
print("bar")
^ input
end
end
...And so on.
How do I do that robustly?
EDIT: My original answer did not take into account the "innermost" requirement. I've since taken that into account
To make things "robust," there are a few considerations.
First of all, it's important that you skip over string and comment contents, to avoid incorrect output in situations like:
foo = function()
print(" function() ")
-- function()
print("bar")
^ input
end
This can be somewhat difficult, considering Lua's nested string and comment syntax. Consider, for example, a situation where the input begins in a nested string or comment:
foo = function()
print([[
bar = function()
print("baz")
^ input
end
]])
end
Consequently, if you want a completely robust system, it is not acceptable to only parse backwards until you hit the end of a function parameter list, because you may not have parsed backwards far enough to reach a [[ which would invalidate your match. It is therefore necessary to parse the entire file up to your position (unless you're okay with incorrect matches in these weird situations. If this is an editor plugin, these "incorrect" results may actually be desirable, because they would allow you to edit lua code which is stored in string literal form inside other lua code using the same plugin).
Because the particular syntax that you're trying to match doesn't have any kind of "nesting", a full-blown parser isn't needed. You will need to maintain a stack, however, to keep track of scope. With that in mind, all you need to do is step through the source file character-by-character from the beginning, applying the following logic:
Every time a " or ' is encountered, ignore the characters up to the closing " or '. Be careful to handle escapes like \" and \\
Every time a -- is encountered, ignore the characters up to the closing newline for the comment. Be careful to only do this if the comment is not a multiline comment.
Every time a multiline string opening symbol is encountered (such as [[, [=[, etc), or a multiline comment symbol is encountered (such as --[[ or --[=[, etc) ignore the characters up until the closing square brackets with the proper number of matching equals signs between them.
When a word boundary is encountered check to see if the characters after it could begin a block which ends with an end (for example, if, while, for, function, etc. DO NOT include repeat). If so, push the position on the scope stack. A "word boundary" in this case is any character which could not be used a lua identifier (this is to prevent matches in cases like abcfunction()). The beginning of the file is also considered a word boundary.
If a word boundary is encountered and it is followed by end, pop the top element of the stack. If the stack has no elements, complain about a syntax error.
When you finally step forward and reach your "input" position, pop elements from the stack until you find a function scope. Step forward from that position to the next ), ignoring )'s in comments (which could theoretically be found in an argument list if it spans multiple lines or contains inline --[[ ]] comments). That position is your result.
This should handle every case, including situations where the function syntactic sugar is used, like
function foo()
print("bar")
end
which you did not include in your example but which I imagine you still want to match.
I am trying to use the regular expression (?r-s:pattern) as mentioned in the Flex manual.
Following code works only when i input small letter 'a' and not the caps 'A'
%%
[(?i:a)] { printf("color"); }
\n { printf("NEWLINE\n"); return EOL;}
. { printf("Mystery character %s\n", yytext); }
%%
OUTPUT
a
colorNEWLINE
A
Mystery character A
NEWLINE
Reverse is also true i.e. if i change the line (?i:a) to (?i:A) it only considers 'A' as valid input and not 'a'.
If I remove the square brackets i.e. [] it gives error as
"ex1.lex", line 2: unrecognized rule
If I enclose the "(?i:a)" then it compiles but after executing it always goes to last rule i.e. "Mystery character..."
Please let me know how to use it properly.
I guess I am late.. :) Anyway, which flex version are you using, I have version 2.5.35 installed and correctly recognizes above pattern. Perhaps you're using old version!!!
Now regarding the enclosing with [] brackets. It works because as per [] regex rule it will try to match any of individual (, ?, i, :, a or ). Thats why a gets recognized and not A (because it is not in the list).
The way I read the manual, the rule without the square brackets should perform the case-insensitive matching you're looking for--I can't explain why you get an error at compile time. But you can achieve the same behavior in one of two ways. One, you can enumerate the upper and lower case characters in the character class:
%%
[Aa] { printf("color"); }
%%
Two, you can specify the case-insensitive scanner option, either on the command line as -i or --case-insensitive or in your .l file:
%%
%option case-insensitive
[a] {printf("color"); }
%%