file, _ := os.Open("x.txt")
f := bufio.NewReader(file)
for {
read_line, _ := ReadString('\n')
fmt.Print(read_line)
// other code what work with parsed line...
}
end it add \n on every line , end program to work , only work with last line...
Please put example, i try anything end any solution what i find here not work for me.
You can slice off the last character:
read_line = read_line[:len(read_line)-1]
Perhaps a better approach is to use the strings library:
read_line = strings.TrimSuffix(read_line, "\n")
If you want to read a file line-by-line, using bufio.Scanner will be easier. Scanner won't includes endline (\n or \r\n) into the line.
file, err := os.Open("yourfile.txt")
if err != nil {
//handle error
return
}
defer file.Close()
s := bufio.NewScanner(file)
for s.Scan() {
read_line := s.Text()
// other code what work with parsed line...
}
If you want to remove endline, I suggest you to use TrimRightFunc, i.e.
read_line = strings.TrimRightFunc(read_line, func(c rune) bool {
//In windows newline is \r\n
return c == '\r' || c == '\n'
})
Update:
As pointed by #GwynethLlewelyn, using TrimRight will be simpler (cleaner), i.e.
read_line = strings.TrimRight(read_line, "\r\n")
Internally, TrimRight function call TrimRightFunc, and will remove the character if it match any character given as the second argument of TrimRight.
Clean and Simple Approach
You can also use the strings.TrimSpace() function which will also remove any extra trailing or leading characters like the ones of regex \n, \r and this approach is much more cleaner.
read_line = strings.TrimSpace(read_line)
Go doc: https://pkg.go.dev/strings#TrimSpace
Related
I working on a language similar to ruby called gaiman and I'm using PEG.js to generate the parser.
Do you know if there is a way to implement heredocs with proper indentation?
xxx = <<<END
hello
world
END
the output should be:
"hello
world"
I need this because this code doesn't look very nice:
def foo(arg) {
if arg == "here" then
return <<<END
xxx
xxx
END
end
end
this is a function where the user wants to return:
"xxx
xxx"
I would prefer the code to look like this:
def foo(arg) {
if arg == "here" then
return <<<END
xxx
xxx
END
end
end
If I trim all the lines user will not be able to use a string with leading spaces when he wants. Does anyone know if PEG.js allows this?
I don't have any code yet for heredocs, just want to be sure if something that I want is possible.
EDIT:
So I've tried to implement heredocs and the problem is that PEG doesn't allow back-references.
heredoc = "<<<" marker:[\w]+ "\n" text:[\s\S]+ marker {
return text.join('');
}
It says that the marker is not defined. As for trimming I think I can use location() function
I don't think that's a reasonable expectation for a parser generator; few if any would be equal to the challenge.
For a start, recognising the here-string syntax is inherently context-sensitive, since the end-delimiter must be a precise copy of the delimiter provided after the <<< token. So you would need a custom lexical analyser, and that means that you need a parser generator which allows you to use a custom lexical analyser. (So a parser generator which assumes you want a scannerless parser might not be the optimal choice.)
Recognising the end of the here-string token shouldn't be too difficult, although you can't do it with a single regular expression. My approach would be to use a custom scanning function which breaks the here-string into a series of lines, concatenating them as it goes until it reaches a line containing only the end-delimiter.
Once you've recognised the text of the literal, all you need to normalise the spaces in the way you want is the column number at which the <<< starts. With that, you can trim each line in the string literal. So you only need a lexical scanner which accurately reports token position. Trimming wouldn't normally be done inside the generated lexical scanner; rather, it would be the associated semantic action. (Equally, it could be a semantic action in the grammar. But it's always going to be code that you write.)
When you trim the literal, you'll need to deal with the cases in which it is impossible, because the user has not respected the indentation requirement. And you'll need to do something with tab characters; getting those right probably means that you'll want a lexical scanner which computes visible column positions rather than character offsets.
I don't know if peg.js corresponds with those requirements, since I don't use it. (I did look at the documentation, and failed to see any indication as to how you might incorporate a custom scanner function. But that doesn't mean there isn't a way to do it.) I hope that the discussion above at least lets you check the detailed documentation for the parser generator you want to use, and otherwise find a different parser generator which will work for you in this use case.
Here is the implementation of heredocs in Peggy successor to PEG.js that is not maintained anymore. This code was based on the GitHub issue.
heredoc = "<<<" begin:marker "\n" text:($any_char+ "\n")+ _ end:marker (
&{ return begin === end; }
/ '' { error(`Expected matched marker "${begin}", but marker "${end}" was found`); }
) {
const loc = location();
const min = loc.start.column - 1;
const re = new RegExp(`\\s{${min}}`);
return text.map(line => {
return line[0].replace(re, '');
}).join('\n');
}
any_char = (!"\n" .)
marker_char = (!" " !"\n" .)
marker "Marker" = $marker_char+
_ "whitespace"
= [ \t\n\r]* { return []; }
EDIT: above didn't work with another piece of code after heredoc, here is better grammar:
{ let heredoc_begin = null; }
heredoc = "<<<" beginMarker "\n" text:content endMarker {
const loc = location();
const min = loc.start.column - 1;
const re = new RegExp(`^\\s{${min}}`, 'mg');
return {
type: 'Literal',
value: text.replace(re, '')
};
}
__ = (!"\n" !" " .)
marker 'Marker' = $__+
beginMarker = m:marker { heredoc_begin = m; }
endMarker = "\n" " "* end:marker &{ return heredoc_begin === end; }
content = $(!endMarker .)*
I have such string:
"k1=v1; k2=v2; k3=v3"
Is there any simple way to make a map[string]string from it?
You will need to use a couple of calls to strings.Split():
s := "k1=v1; k2=v2; k3=v3"
entries := strings.Split(s, "; ")
m := make(map[string]string)
for _, e := range entries {
parts := strings.Split(e, "=")
m[parts[0]] = parts[1]
}
fmt.Println(m)
The first call will separate the different entries in the supplied string while the second will split the key/values apart. A working example can be found here.
I have a string like A=B&C=D&E=F, how to parse it into map in golang?
Here is example on Java, but I don't understand this split part
String text = "A=B&C=D&E=F";
Map<String, String> map = new LinkedHashMap<String, String>();
for(String keyValue : text.split(" *& *")) {
String[] pairs = keyValue.split(" *= *", 2);
map.put(pairs[0], pairs.length == 1 ? "" : pairs[1]);
}
Maybe what you really want is to parse an HTTP query string, and url.ParseQuery does that. (What it returns is, more precisely, a url.Values storing a []string for every key, since URLs sometimes have more than one value per key.) It does things like parse HTML escapes (%0A, etc.) that just splitting doesn't. You can find its implementation if you search in the source of url.go.
However, if you do really want to just split on & and = like that Java code did, there are Go analogues for all of the concepts and tools there:
map[string]string is Go's analog of Map<String, String>
strings.Split can split on & for you. SplitN limits the number of pieces split into like the two-argument version of split() in Java does. Note that there might only be one piece so you should check len(pieces) before trying to access pieces[1] say.
for _, piece := range pieces will iterate the pieces you split.
The Java code seems to rely on regexes to trim spaces. Go's Split doesn't use them, but strings.TrimSpace does something like what you want (specifically, strips all sorts of Unicode whitespace from both sides).
I'm leaving the actual implementation to you, but perhaps these pointers can get you started.
import ( "strings" )
var m map[string]string
var ss []string
s := "A=B&C=D&E=F"
ss = strings.Split(s, "&")
m = make(map[string]string)
for _, pair := range ss {
z := strings.Split(pair, "=")
m[z[0]] = z[1]
}
This will do what you want.
There is a very simple way provided by golang net/url package itself.
Change your string to make it a url with query params text := "method://abc.xyz/A=B&C=D&E=F";
Now just pass this string to Parse function provided by net/url.
import (
netURL "net/url"
)
u, err := netURL.Parse(textURL)
if err != nil {
log.Fatal(err)
}
Now u.Query() will return you a map containing your query params. This will also work for complex types.
Here is a demonstration of a couple of methods:
package main
import (
"fmt"
"net/url"
)
func main() {
{
q, e := url.ParseQuery("west=left&east=right")
if e != nil {
panic(e)
}
fmt.Println(q) // map[east:[right] west:[left]]
}
{
u := url.URL{RawQuery: "west=left&east=right"}
q := u.Query()
fmt.Println(q) // map[east:[right] west:[left]]
}
}
https://golang.org/pkg/net/url#ParseQuery
https://golang.org/pkg/net/url#URL.Query
How can I parse blocks of line comments with MGrammar?
I want to parse blocks of line comments. Line comments that are next to each should grouped in the MGraph output.
I'm having trouble grouping blocks of line comments together. My current grammar uses "\r\n\r\n" to terminate a block but that will not work in all cases such as at end of file or when I introduce other syntaxes.
Sample input could look like this:
/// This is block
/// number one
/// This is block
/// number two
My current grammar looks like this:
module MyModule
{
language MyLanguage
{
syntax Main = CommentLineBlock*;
token CommentContent = !(
'\u000A' // New Line
|'\u000D' // Carriage Return
|'\u0085' // Next Line
|'\u2028' // Line Separator
|'\u2029' // Paragraph Separator
);
token CommentLine = "///" c:CommentContent* => c;
syntax CommentLineBlock = (CommentLine)+ "\r\n\r\n";
interleave Whitespace = " " | "\r" | "\n";
}
}
The Problem is, that you interleave all whitespaces - so after parsing the tokens and coming to the lexer, they just "don't exist" anymore.
CommentLineBlock is syntax in your case, but you need the comment-blocks to be completely consumed in tokens...
language MyLanguage
{
syntax Main = CommentLineBlock*;
token LineBreak = '\u000D\u000A'
| '\u000A' // New Line
|'\u000D' // Carriage Return
|'\u0085' // Next Line
|'\u2028' // Line Separator
|'\u2029' // Paragraph Separator
;
token CommentContent = !(
'\u000A' // New Line
|'\u000D' // Carriage Return
|'\u0085' // Next Line
|'\u2028' // Line Separator
|'\u2029' // Paragraph Separator
);
token CommentLine = "//" c:CommentContent*;
token CommentLineBlock = c:(CommentLine LineBreak?)+ => Block {c};
interleave Whitespace = " " | "\r" | "\n";
}
But then the problem is, that the subtoken-rules in CommentLine won't be processed - you get plain strings parsed.
Main[
[
Block{
"/// This is block\r\n/// number one\r\n"
},
Block{
"/// This is block\r\n/// number two"
}
]
]
I might try to find a nicer way tonight :-)
using grep, vim's grep, or another unix shell command, I'd like to find the functions in a large cpp file that contain a specific word in their body.
In the files that I'm working with the word I'm looking for is on an indented line, the corresponding function header is the first line above the indented line that starts at position 0 and is not a '{'.
For example searching for JOHN_DOE in the following code snippet
int foo ( int arg1 )
{
/// code
}
void bar ( std::string arg2 )
{
/// code
aFunctionCall( JOHN_DOE );
/// more code
}
should give me
void bar ( std::string arg2 )
The algorithm that I hope to catch in grep/vim/unix shell scripts would probably best use the indentation and formatting assumptions, rather than attempting to parse C/C++.
Thanks for your suggestions.
I'll probably get voted down for this!
I am an avid (G)VIM user but when I want to review or understand some code I use Source Insight. I almost never use it as an actual editor though.
It does exactly what you want in this case, e.g. show all the functions/methods that use some highlighted data type/define/constant/etc... in a relations window...
(source: sourceinsight.com)
Ouch! There goes my rep.
As far as I know, this can't be done. Here's why:
First, you have to search across lines. No problem, in vim adding a _ to a character class tells it to include new lines. so {_.*} would match everything between those brackets across multiple lines.
So now you need to match whatever the pattern is for a function header(brittle even if you get it to work), then , and here's the problem, whatever lines are between it and your search string, and finally match your search string. So you might have a regex like
/^\(void \+\a\+ *(.*)\)\_.*JOHN_DOE
But what happens is the first time vim finds a function header, it starts matching. It then matches every character until it finds JOHN_DOE. Which includes all the function headers in the file.
So the problem is that, as far as I know, there's no way to tell vim to match every character except for this regex pattern. And even if there was, a regex is not the tool for this job. It's like opening a beer with a hammer. What we should do is write a simple script that gives you this info, and I have.
fun! FindMyFunction(searchPattern, funcPattern)
call search(a:searchPattern)
let lineNumber = line(".")
let lineNumber = lineNumber - 1
"call setpos(".", [0, lineNumber, 0, 0])
let lineString = getline(lineNumber)
while lineString !~ a:funcPattern
let lineNumber = lineNumber - 1
if lineNumber < 0
echo "Function not found :/"
endif
let lineString = getline(lineNumber)
endwhile
echo lineString
endfunction
That should give you the result you want and it's way easier to share, debug, and repurpose than a regular expression spit from the mouth of Cthulhu himself.
Tough call, although as a starting point I would suggest this wonderful VIM Regex Tutorial.
You cannot do that reliably with a regular expression, because code is not a regular language. You need a real parser for the language in question.
Arggh! I admit this is a bit over the top:
A little program to filter stdin, strip comments, and put function bodies on the same line. It'll get fooled by namespaces and function definitions inside class declarations, besides other things. But it might be a good start:
#include <stdio.h>
#include <assert.h>
int main() {
enum {
NORMAL,
LINE_COMMENT,
MULTI_COMMENT,
IN_STRING,
} state = NORMAL;
unsigned depth = 0;
for(char c=getchar(),prev=0; !feof(stdin); prev=c,c=getchar()) {
switch(state) {
case NORMAL:
if('/'==c && '/'==prev)
state = LINE_COMMENT;
else if('*'==c && '/'==prev)
state = MULTI_COMMENT;
else if('#'==c)
state = LINE_COMMENT;
else if('\"'==c) {
state = IN_STRING;
putchar(c);
} else {
if(('}'==c && !--depth) || (';'==c && !depth)) {
putchar(c);
putchar('\n');
} else {
if('{'==c)
depth++;
else if('/'==prev && NORMAL==state)
putchar(prev);
else if('\t'==c)
c = ' ';
if(' '==c && ' '!=prev)
putchar(c);
else if(' '<c && '/'!=c)
putchar(c);
}
}
break;
case LINE_COMMENT:
if(' '>c)
state = NORMAL;
break;
case MULTI_COMMENT:
if('/'==c && '*'==prev) {
c = '\0';
state = NORMAL;
}
break;
case IN_STRING:
if('\"'==c && '\\'!=prev)
state = NORMAL;
putchar(c);
break;
default:
assert(!"bug");
}
}
putchar('\n');
return 0;
}
Its c++, so just it in a file, compile it to a file named 'stripper', and then:
cat my_source.cpp | ./stripper | grep JOHN_DOE
So consider the input:
int foo ( int arg1 )
{
/// code
}
void bar ( std::string arg2 )
{
/// code
aFunctionCall( JOHN_DOE );
/// more code
}
The output of "cat example.cpp | ./stripper" is:
int foo ( int arg1 ) { }
void bar ( std::string arg2 ){ aFunctionCall( JOHN_DOE ); }
The output of "cat example.cpp | ./stripper | grep JOHN_DOE" is:
void bar ( std::string arg2 ){ aFunctionCall( JOHN_DOE ); }
The job of finding the function name (guess its the last identifier to precede a "(") is left as an exercise to the reader.
For that kind of stuff, although it comes to primitive searching again, I would recommend compview plugin. It will open up a search window, so you can see the entire line where the search occured and automatically jump to it. Gives a nice overview.
(source: axisym3.net)
Like Robert said Regex will help. In command mode start a regex search by typing the "/" character followed by your regex.
Ctags1 may also be of use to you. It can generate a tag file for a project. This tag file allows a user to jump directly from a function call to it's definition even if it's in another file using "CTRL+]".
u can use grep -r -n -H JOHN_DOE * it will look for "JOHN_DOE" in the files recursively starting from the current directory
you can use the following code to practically find the function which contains the text expression:
public void findFunction(File file, String expression) {
Reader r = null;
try {
r = new FileReader(file);
} catch (FileNotFoundException ex) {
ex.printStackTrace();
}
BufferedReader br = new BufferedReader(r);
String match = "";
String lineWithNameOfFunction = "";
Boolean matchFound = false;
try {
while(br.read() > 0) {
match = br.readLine();
if((match.endsWith(") {")) ||
(match.endsWith("){")) ||
(match.endsWith("()")) ||
(match.endsWith(")")) ||
(match.endsWith("( )"))) {
// this here is because i guessed that method will start
// at the 0
if((match.charAt(0)!=' ') && !(match.startsWith("\t"))) {
lineWithNameOfFunction = match;
}
}
if(match.contains(expression)) {
matchFound = true;
break;
}
}
if(matchFound)
System.out.println(lineWithNameOfFunction);
else
System.out.println("No matching function found");
} catch (IOException ex) {
ex.printStackTrace();
}
}
i wrote this in JAVA, tested it and works like a charm. has few drawbacks though, but for starters it's fine. didn't add support for multiple functions containing same expression and maybe some other things. try it.