I tried this code but the result is just uppercase first letter (not each letter)
String ucwords(String input) {
if (input == null) {
throw new ArgumentError("string: $input");
}
if (input.length == 0) {
return input;
}
return input[0].toUpperCase() + input.substring(1);
}
I suggest you read the documentation for toUpperCase(). It gives an example of exactly what you want to do.
Your code should read:
return input.toUpperCase();
Well, i didn't understand you question completely, but if you want to put each letter in Uppercase, this means all in Uppercase, well you can just use .toUpperCase() like this:
//This will print 'LIKE THIS'
print('like this'.toUpperCase());
But if you want to put an specific letter in UpperCase, you can just use the Text_Tools package:
https://pub.dev/packages/text_tools
Example
//This will put the letter in position 2 in UpperCase, will print 'liKe this'
print(TextTools.toUppercaseAnyLetter(text: 'like this', position: 2));
If this be of any help
Related
I working on a language similar to ruby called gaiman and I'm using PEG.js to generate the parser.
Do you know if there is a way to implement heredocs with proper indentation?
xxx = <<<END
hello
world
END
the output should be:
"hello
world"
I need this because this code doesn't look very nice:
def foo(arg) {
if arg == "here" then
return <<<END
xxx
xxx
END
end
end
this is a function where the user wants to return:
"xxx
xxx"
I would prefer the code to look like this:
def foo(arg) {
if arg == "here" then
return <<<END
xxx
xxx
END
end
end
If I trim all the lines user will not be able to use a string with leading spaces when he wants. Does anyone know if PEG.js allows this?
I don't have any code yet for heredocs, just want to be sure if something that I want is possible.
EDIT:
So I've tried to implement heredocs and the problem is that PEG doesn't allow back-references.
heredoc = "<<<" marker:[\w]+ "\n" text:[\s\S]+ marker {
return text.join('');
}
It says that the marker is not defined. As for trimming I think I can use location() function
I don't think that's a reasonable expectation for a parser generator; few if any would be equal to the challenge.
For a start, recognising the here-string syntax is inherently context-sensitive, since the end-delimiter must be a precise copy of the delimiter provided after the <<< token. So you would need a custom lexical analyser, and that means that you need a parser generator which allows you to use a custom lexical analyser. (So a parser generator which assumes you want a scannerless parser might not be the optimal choice.)
Recognising the end of the here-string token shouldn't be too difficult, although you can't do it with a single regular expression. My approach would be to use a custom scanning function which breaks the here-string into a series of lines, concatenating them as it goes until it reaches a line containing only the end-delimiter.
Once you've recognised the text of the literal, all you need to normalise the spaces in the way you want is the column number at which the <<< starts. With that, you can trim each line in the string literal. So you only need a lexical scanner which accurately reports token position. Trimming wouldn't normally be done inside the generated lexical scanner; rather, it would be the associated semantic action. (Equally, it could be a semantic action in the grammar. But it's always going to be code that you write.)
When you trim the literal, you'll need to deal with the cases in which it is impossible, because the user has not respected the indentation requirement. And you'll need to do something with tab characters; getting those right probably means that you'll want a lexical scanner which computes visible column positions rather than character offsets.
I don't know if peg.js corresponds with those requirements, since I don't use it. (I did look at the documentation, and failed to see any indication as to how you might incorporate a custom scanner function. But that doesn't mean there isn't a way to do it.) I hope that the discussion above at least lets you check the detailed documentation for the parser generator you want to use, and otherwise find a different parser generator which will work for you in this use case.
Here is the implementation of heredocs in Peggy successor to PEG.js that is not maintained anymore. This code was based on the GitHub issue.
heredoc = "<<<" begin:marker "\n" text:($any_char+ "\n")+ _ end:marker (
&{ return begin === end; }
/ '' { error(`Expected matched marker "${begin}", but marker "${end}" was found`); }
) {
const loc = location();
const min = loc.start.column - 1;
const re = new RegExp(`\\s{${min}}`);
return text.map(line => {
return line[0].replace(re, '');
}).join('\n');
}
any_char = (!"\n" .)
marker_char = (!" " !"\n" .)
marker "Marker" = $marker_char+
_ "whitespace"
= [ \t\n\r]* { return []; }
EDIT: above didn't work with another piece of code after heredoc, here is better grammar:
{ let heredoc_begin = null; }
heredoc = "<<<" beginMarker "\n" text:content endMarker {
const loc = location();
const min = loc.start.column - 1;
const re = new RegExp(`^\\s{${min}}`, 'mg');
return {
type: 'Literal',
value: text.replace(re, '')
};
}
__ = (!"\n" !" " .)
marker 'Marker' = $__+
beginMarker = m:marker { heredoc_begin = m; }
endMarker = "\n" " "* end:marker &{ return heredoc_begin === end; }
content = $(!endMarker .)*
I wanted to match for a string which starts with a '#', then matches everything until it matches the character that follows '#'. This can be achieved using capturing groups like this: #(.)[^(?1)]*(?1)(EDIT this regex is also erroneous). This matches #$foo$, does not match #%bar&, matches first 6 characters of #"foo"bar.
But since flex lex does not support capturing groups, what is the workaround here?
As you say, (f)lex does not support capturing groups, and it certainly doesn't support backreferences.
So there is no simple workaround, but there are workarounds. Here are a few possibilities:
You can read the input one character at a time using the input() function, until you find the matching character (but you have to create your own buffer to store the characters, because characters read by input() are not added to the current token). This is not the most efficient because reading one character at a time is a bit clunky, but it's the only interface that (f)lex offers. (The following snippet assumes you have some kind of expandable stringBuilder; if you are using C++, this would just be replaced with a std::string.)
#. { StringBuilder sb = string_builder_new();
int delim = yytext[1];
for (;;) {
int next = input();
if (next == delim) break;
if (next == EOF ) { /* Signal error */; break; }
string_builder_addchar(next);
}
yylval = string_builder_release();
return DELIMITED_STRING;
}
Even less efficiently, but perhaps more conveniently, you can get (f)lex to accumulate the characters in yytext using yymore(), matching one character at a time in a start condition:
%x DELIMITED
%%
int delim;
#. { delim = yytext[1]; BEGIN(DELIMITED); }
<DELIMITED>.|\n { if (yytext[0] == delim) {
yylval = strdup(yytext);
BEGIN(INITIAL);
return DELIMITED_STRING;
}
yymore();
}
<DELIMITED><<EOF>> { /* Signal unterminated string error */ }
The most efficient solution (in (f)lex) is to just write one rule for each possible delimiter. While that's a lot of rules, they could be easily generated with a small script in whatever scripting language you prefer. And, actually, there are not that many rules, particularly if you don't allow alphabetic and non-printing characters to be delimiters. This has the additional advantage that if you want Perl-like parenthetic delimiters (#(Hello) instead of #(Hello(), you can just modify the individual pattern to suit (as I've done below). [Note 1] Since all the actions are the same; it might be easier to use a macro for the action, making it easier to modify.
/* Ordinary punctuation */
#:[^:]*: { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
#:[^:]*: { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
#![^!]*! { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
#\.[^.]*\. { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
/* Matched pairs */
#<[^>]*> { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
#\[[^]]*] { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
/* Trap errors */
# { /* Report unmatched or invalid delimiter error */ }
If I were writing a script to generate these rules, I would use hexadecimal escapes for all the delimiter characters rather than trying to figure out which ones needed escapes.
Notes:
Perl requires nested balanced parentheses in constructs like that. But you can't do that with regular expressions; if you wanted to reproduce Perl behaviour, you'd need to use some variation on one of the other suggestions. I'll try to revisit this answer later to address that feature.
I am trying to navigate through MSHTML (TWebBrowser) document word by word for spell checking each word. But I have problems:
1) it selects words oddly - For example "Abc: def" - the selection is "Abc" as first and ":" as second word. Also, if a word ends with space for example "Abc def" it would select "Abc " with the trailing space included. Is there a way to select only what is the word and not the trailing spaces or non-word characters? Or is this the best move can offer and it is up to programmer to filter the words correctly?
2) Is there a way to specify which delimiters it will use for example only to let it use space as a delimiter and not colons, dashes or other characters? (Microsoft docs say - "Word - Moves one or more words. A word is a collection of characters terminated by a space or other white space character. ", but it also uses other delimiters, not just space, tabs and similar)
Current code is:
bool SelectNext(bool firstonly = false)
{
DelphiInterface<IHTMLDocument2> diDoc = WB->Document;
if (diDoc)
{
DelphiInterface<IHTMLSelectionObject> diSel = diDoc->selection;
if (diSel)
{
if (firstonly) diSel->empty();
if (diSel->type_ == "None" || diSel->type_ == "Text")
{
DelphiInterface<IDispatch> diRangeDisp;
if (SUCCEEDED(diSel->createRange(diRangeDisp)) && diRangeDisp)
{
DelphiInterface<IHTMLTxtRange> diRange;
if (SUCCEEDED(diRangeDisp->QueryInterface(IID_IHTMLTxtRange, reinterpret_cast<void**>(&diRange))) && diRange)
{
int Result;
unsigned short Res;
diRange->move("word", 1, Result);
diRange->expand("word", Res);
diRange->select();
return true;
}
}
}
}
}
return false;
}
SelectNext(true); // Select the first "word"
// navigate first 200 words
for (int i = 0; i <= 200; i++)
{
SelectNext();
// do spell checking of selected word here...
}
I have written a flex lexer to handle the text in BYOND's .dmi file format. The contents inside are (key, value) pairs delimited by '='. Valid keys are all essentially keywords (such as "width"), and invalid keys are not errors: they are just ignored.
Interestingly, the current state of BYOND's .dmi parser uses everything prior to the '=' as its keyword, and simply ignores any excess junk. This means "\twidth123" is recognized as "width".
The crux of my problem is in allowing for this irregularity. In doing so my generated lexer expands from ~40-50KB to ~13-14MB. For reference, I present the following contrived example:
%option c++ noyywrap
fill [^=#\n]*
%%
{fill}version{fill} { return 0; }
{fill}width{fill} { return 0; }
{fill}height{fill} { return 0; }
{fill}state{fill} { return 0; }
{fill}dirs{fill} { return 0; }
{fill}frames{fill} { return 0; }
{fill}delay{fill} { return 0; }
{fill}loop{fill} { return 0; }
{fill}rewind{fill} { return 0; }
{fill}movement{fill} { return 0; }
{fill}hotspot{fill} { return 0; }
%%
fill is the rule that is used to merge the keywords with "anything before the =". Running flex on the above yields a ~13MB lex.yy.cc on my computer. Simply removing the kleene star (*) in the fill rule yields a 45KB lex.yy.cc file; however, obviously, this then makes the lexer incorrect.
Are there any tricks, flex options, or lexer hacks to avoid this insane expansion? The only things I can think of are:
Disallow "width123" to represent "width", which is undesirable as then technically-correct files could not be parsed.
Make one rule that is simply [^=\n]+ to return some identifier token, and pick out the keyword in the parser. This seems suboptimal to me as well, particularly because different keywords have different value types and it seems most natural to be able to handle "'width' '=' INT" and "'version' '=' FLOAT" in the parser instead of "ID '=' VALUE" followed by picking out the keyword in the identifier, making sure the value is of the right type, etc.
I could make the rule {fill}(width|height|version|...){fill}, which does indeed keep the generated file small. However, while regular expression parsers tend to produce "captures," flex just gives me yytext and re-parsing that for a keyword to produce the desired token seems to be very undesirable in terms of algorithmic complexity.
Make fill a separate rule of its own that does nothing, and remove it from all the other rules, and separate its definition from whitespace for clarity:
whitespace [ \t\f]
fill [^#=\n]
%%
{whitespace}+ ;
{fill}+ ;
I would probably also avoid building the keywords into the lexer and just use an identifier [a-zA-Z]+ rule that does a table lookup. And finally add a rule to catch the =:
. return yytext[0];
to let the parser handle all special characters.
This is not really a problem flex is "good at", but it can be solved if it is precisely defined. In particular, it is important to know which of the keywords should be returned if the random string of letters before the = contains more than one keyword. For example, suppose the input is:
garbage_widtheight_moregarbage = 42
Now, is that setting the width or the height?
Remember that flex scanners will choose the rule with longest match, and of rules with equally long matches, the first one in the lexical description.
So the model presented in the OP:
fill [^=#\n]*
%%
{fill}width{fill} { return 0; }
{fill}height{fill} { return 0; }
/* SNIP */
will always prefer width to height, because the matches will be the same length (both terminate at the last character before the =), and the width pattern comes first in the file. If the rules were written in the opposite order, height would be preferred.
On the other hand, if you removed the second {fill}:
{fill}width{fill} { return 0; }
{fill}height{fill} { return 0; }
then the last keyword in the input (in this case, height) will be preferred, because that one has the longer match.
The most likely requirement, however, is that the first keyword be recognized, so neither of the preceding will work. In order to match the first keyword, it is necessary to first match the shortest possible sequence of {fill}. And since flex does not implement non-greedy repetition, that can only be done with a character-by-character span.
Here's an example, using start conditions. Note that we hold onto the keyword token until we actually find the =, in case the = is not found.
/* INITIAL: beginning of a line
* FIND_EQUAL: keyword recognized, looking for the =
* VALUE: = recognized, lexing the right-hand side
* NEXT_LINE: find the next line and continue the scan
*/
%x FIND_EQUAL VALUE
%%
int keyword;
"[#=]".* /* Skip comments and lines with no recognizable keyword */
version { keyword = KW_VERSION; BEGIN(FIND_EQUAL); }
width { keyword = KW_WIDTH; BEGIN(FIND_EQUAL); }
height { keyword = KW_HEIGHT; BEGIN(FIND_EQUAL); }
/* etc. */
.|\n /* Skip any other single character, or newline */
<FIND_EQUAL>{
[^=#\n]*"=" { BEGIN(VALUE); return keyword; }
"#".* { BEGIN(INITIAL); }
\n { BEGIN(INITIAL); }
}
<VALUE>{
"#".* { BEGIN(INITIAL); }
\n { BEGIN(INITIAL); }
[[:blank:]]+ ; /* Ignore space and tab characters */
[[:digit:]]+ { yylval.ival = atoi(yytext);
BEGIN(NEXT_LINE); return INTEGER;
}
[[:digit:]]+"."[[:digit:]]*|"."[[:digit:]]+ {
yylval.fval = atod(yytext);
BEGIN(NEXT_LINE); return FLOAT;
}
\"([^"]|\\.)*\" { char* s = malloc(yyleng - 1);
yylval.sval = s;
/* Remove quotes and escape characters */
yytext[yyleng - 1] = '\0';
do {
if (*++yytext == '\\') ++yytext;
*s++ = *yytext;
} while (*yytext);
BEGIN(NEXT_LINE); return STRING;
}
/* Other possible value token types */
. BEGIN(NEXT_LINE); /* bad character in value */
}
<NEXT_LINE>.*\n? BEGIN(INITIAL);
In the escape-removal code, you might want to translate things like \n. And you might also want to avoid string values with physical newlines. And a bunch of etceteras. It's only intended as a model.
using grep, vim's grep, or another unix shell command, I'd like to find the functions in a large cpp file that contain a specific word in their body.
In the files that I'm working with the word I'm looking for is on an indented line, the corresponding function header is the first line above the indented line that starts at position 0 and is not a '{'.
For example searching for JOHN_DOE in the following code snippet
int foo ( int arg1 )
{
/// code
}
void bar ( std::string arg2 )
{
/// code
aFunctionCall( JOHN_DOE );
/// more code
}
should give me
void bar ( std::string arg2 )
The algorithm that I hope to catch in grep/vim/unix shell scripts would probably best use the indentation and formatting assumptions, rather than attempting to parse C/C++.
Thanks for your suggestions.
I'll probably get voted down for this!
I am an avid (G)VIM user but when I want to review or understand some code I use Source Insight. I almost never use it as an actual editor though.
It does exactly what you want in this case, e.g. show all the functions/methods that use some highlighted data type/define/constant/etc... in a relations window...
(source: sourceinsight.com)
Ouch! There goes my rep.
As far as I know, this can't be done. Here's why:
First, you have to search across lines. No problem, in vim adding a _ to a character class tells it to include new lines. so {_.*} would match everything between those brackets across multiple lines.
So now you need to match whatever the pattern is for a function header(brittle even if you get it to work), then , and here's the problem, whatever lines are between it and your search string, and finally match your search string. So you might have a regex like
/^\(void \+\a\+ *(.*)\)\_.*JOHN_DOE
But what happens is the first time vim finds a function header, it starts matching. It then matches every character until it finds JOHN_DOE. Which includes all the function headers in the file.
So the problem is that, as far as I know, there's no way to tell vim to match every character except for this regex pattern. And even if there was, a regex is not the tool for this job. It's like opening a beer with a hammer. What we should do is write a simple script that gives you this info, and I have.
fun! FindMyFunction(searchPattern, funcPattern)
call search(a:searchPattern)
let lineNumber = line(".")
let lineNumber = lineNumber - 1
"call setpos(".", [0, lineNumber, 0, 0])
let lineString = getline(lineNumber)
while lineString !~ a:funcPattern
let lineNumber = lineNumber - 1
if lineNumber < 0
echo "Function not found :/"
endif
let lineString = getline(lineNumber)
endwhile
echo lineString
endfunction
That should give you the result you want and it's way easier to share, debug, and repurpose than a regular expression spit from the mouth of Cthulhu himself.
Tough call, although as a starting point I would suggest this wonderful VIM Regex Tutorial.
You cannot do that reliably with a regular expression, because code is not a regular language. You need a real parser for the language in question.
Arggh! I admit this is a bit over the top:
A little program to filter stdin, strip comments, and put function bodies on the same line. It'll get fooled by namespaces and function definitions inside class declarations, besides other things. But it might be a good start:
#include <stdio.h>
#include <assert.h>
int main() {
enum {
NORMAL,
LINE_COMMENT,
MULTI_COMMENT,
IN_STRING,
} state = NORMAL;
unsigned depth = 0;
for(char c=getchar(),prev=0; !feof(stdin); prev=c,c=getchar()) {
switch(state) {
case NORMAL:
if('/'==c && '/'==prev)
state = LINE_COMMENT;
else if('*'==c && '/'==prev)
state = MULTI_COMMENT;
else if('#'==c)
state = LINE_COMMENT;
else if('\"'==c) {
state = IN_STRING;
putchar(c);
} else {
if(('}'==c && !--depth) || (';'==c && !depth)) {
putchar(c);
putchar('\n');
} else {
if('{'==c)
depth++;
else if('/'==prev && NORMAL==state)
putchar(prev);
else if('\t'==c)
c = ' ';
if(' '==c && ' '!=prev)
putchar(c);
else if(' '<c && '/'!=c)
putchar(c);
}
}
break;
case LINE_COMMENT:
if(' '>c)
state = NORMAL;
break;
case MULTI_COMMENT:
if('/'==c && '*'==prev) {
c = '\0';
state = NORMAL;
}
break;
case IN_STRING:
if('\"'==c && '\\'!=prev)
state = NORMAL;
putchar(c);
break;
default:
assert(!"bug");
}
}
putchar('\n');
return 0;
}
Its c++, so just it in a file, compile it to a file named 'stripper', and then:
cat my_source.cpp | ./stripper | grep JOHN_DOE
So consider the input:
int foo ( int arg1 )
{
/// code
}
void bar ( std::string arg2 )
{
/// code
aFunctionCall( JOHN_DOE );
/// more code
}
The output of "cat example.cpp | ./stripper" is:
int foo ( int arg1 ) { }
void bar ( std::string arg2 ){ aFunctionCall( JOHN_DOE ); }
The output of "cat example.cpp | ./stripper | grep JOHN_DOE" is:
void bar ( std::string arg2 ){ aFunctionCall( JOHN_DOE ); }
The job of finding the function name (guess its the last identifier to precede a "(") is left as an exercise to the reader.
For that kind of stuff, although it comes to primitive searching again, I would recommend compview plugin. It will open up a search window, so you can see the entire line where the search occured and automatically jump to it. Gives a nice overview.
(source: axisym3.net)
Like Robert said Regex will help. In command mode start a regex search by typing the "/" character followed by your regex.
Ctags1 may also be of use to you. It can generate a tag file for a project. This tag file allows a user to jump directly from a function call to it's definition even if it's in another file using "CTRL+]".
u can use grep -r -n -H JOHN_DOE * it will look for "JOHN_DOE" in the files recursively starting from the current directory
you can use the following code to practically find the function which contains the text expression:
public void findFunction(File file, String expression) {
Reader r = null;
try {
r = new FileReader(file);
} catch (FileNotFoundException ex) {
ex.printStackTrace();
}
BufferedReader br = new BufferedReader(r);
String match = "";
String lineWithNameOfFunction = "";
Boolean matchFound = false;
try {
while(br.read() > 0) {
match = br.readLine();
if((match.endsWith(") {")) ||
(match.endsWith("){")) ||
(match.endsWith("()")) ||
(match.endsWith(")")) ||
(match.endsWith("( )"))) {
// this here is because i guessed that method will start
// at the 0
if((match.charAt(0)!=' ') && !(match.startsWith("\t"))) {
lineWithNameOfFunction = match;
}
}
if(match.contains(expression)) {
matchFound = true;
break;
}
}
if(matchFound)
System.out.println(lineWithNameOfFunction);
else
System.out.println("No matching function found");
} catch (IOException ex) {
ex.printStackTrace();
}
}
i wrote this in JAVA, tested it and works like a charm. has few drawbacks though, but for starters it's fine. didn't add support for multiple functions containing same expression and maybe some other things. try it.