How can i get the content of an html.Node - html-parsing

I would like to get data from a URL using the GO 3rd party library from http://godoc.org/code.google.com/p/go.net/html . But I came across a problem, that is I couldn't get the content of an html.Node.
There's an example code in the reference document, and here's the code.
s := `<p>Links:</p><ul><li>Foo<li>BarBaz</ul>`
doc, err := html.Parse(strings.NewReader(s))
if err != nil {
log.Fatal(err)
}
var f func(*html.Node)
f = func(n *html.Node) {
if n.Type == html.ElementNode && n.Data == "a" {
for _, a := range n.Attr {
if a.Key == "href" {
fmt.Println(a.Val)
break
}
}
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
f(c)
}
}
f(doc)
The output is:
foo
/bar/baz
If I want to get
Foo
BarBaz
What should I do?

The tree of <strong>Foo</strong>Bar looks basically like this:
ElementNode "a" (this node also includes a list off attributes)
ElementNode "strong"
TextNode "Foo"
TextNode "Bar"
So, assuming that you want to get the plain text of the link (e.g. FooBar) you would have to walk trough the tree and collect all text nodes. For example:
func collectText(n *html.Node, buf *bytes.Buffer) {
if n.Type == html.TextNode {
buf.WriteString(n.Data)
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
collectText(c, buf)
}
}
And the changes in your function:
var f func(*html.Node)
f = func(n *html.Node) {
if n.Type == html.ElementNode && n.Data == "a" {
text := &bytes.Buffer{}
collectText(n, text)
fmt.Println(text)
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
f(c)
}
}

Related

Parsing string in rust

I want to parse the string
"{\"hello , world\",quote,\"\\\",\\\"q\"}"into vec!["hello , world", "quote", "\",\"q"]
I have tried to for loop and check " then split , but it has many corner cases that I couldn't solve, such as \" in the quote.
I got it! Just for loop and check the tokens. Then, use serde_json to parse the data.
fn parse_code(code: &str) -> Vec<String> {
let code = code.trim_end_matches("}").trim_start_matches("{");
let mut ans = Vec::new();
let mut start_string = false;
let mut extra = false;
let mut pre = '\0';
for ch in code.chars() {
if ch == '"' && pre != '\\' {
start_string = !start_string;
}
if pre == ',' && start_string == false {
start_string = true;
extra = true;
ans.push('"');
}
if ch == ',' && start_string == true && extra == true {
start_string = false;
extra = false;
ans.push('"');
}
ans.push(ch);
pre = ch;
}
if extra {
ans.push('"');
}
let ans: String = ans.iter().collect();
let code = "[".to_string() + &ans + "]";
serde_json::from_str::<Value>(&code)
.unwrap()
.as_array()
.unwrap()
.iter()
.map(|x| x.as_str().unwrap().to_string())
.collect()
}

Is there any arithmetic formula that can test all given numbers are in row, like [ 3 5 4 ]

I m making a card game where 3 random numbers are generated..I need to check are these numbers Row numbers...
like 4 6 5 and 23,24,22. are row numbers
I have made method but I think there should be easy arithmetic formulas
I have tried this and working well, but I need simple arithmatic formula to avoid use of array and for
bool isAllInRow(int num1, int num2,int num3)
{
//subject : tinpati
List<int> numbers=[num1,num2,num3];
bool is_in_row=true;
numbers.sort();
if(numbers[0]==1 && numbers[1]==12 && numbers[2]==13)
return true;
for(int x=0;x<numbers.length-1;x++)
{
if(numbers[x]-numbers[x+1]!=-1)
{
is_in_row=false;
break;
}
}
return is_in_row;
}
So you want to know if the cards form a straight, with aces both low and high.
Is the "three cards" fixed, or would you want to generalize to more cards?
Sorting should be cheap for such a short list, so that's definitely a good start. Then you just need to check the resulting sequence is increasing adjacent values.
I'd do it as:
bool isStraight(List<int> cards) {
var n = cards.length;
if (n < 2) return true;
cards.sort();
var first = cards.first;
if (first == 1 && cards[1] != 2) {
// Pretend Ace is Jack if n == 3.
// Accepts if remaining cards form a straight up to the King.
first = 14 - n;
}
for (var i = 1; i < n; i++) {
if (cards[i] != first + i) return false;
}
return true;
}
This code rejects card sets that have duplicates, or do not form a straight.
I think you are looking for Arithmetic Progression.
bool checkForAP(List<int> numberArr) {
numberArr.sort();
int diff = numberArr[1] - numberArr[0];
if (numberArr[2] - numberArr[1] != diff) {
return false;
}
return true;
}
And modify your function like
bool isAllInRow(int num1, int num2,int num3) {
//subject : tinpati
List<int> numbers=[num1,num2,num3];
bool is_in_row=true;
numbers.sort();
if(numbers[0]==1 && numbers[1]==12 && numbers[2]==13)
return true;
return checkForAP(numbers);
}
Note: remove sort in AP method as it is of no use. Since your numbers
list length is 3 I directly compared numbers for AP, the same can also
be written for n numbers with for.
bool checkForAp(numberArr) {
numberArr.sort();
int diff = numberArr[1] - numberArr[0];
for(int i = 2; i< numberArr.length ;i++) {
if (numberArr[i] - numberArr[i - 1] != diff) {
return false;
}
}
return true;
}
You could do it like this:
bool isAllInRow(int num1, int num2,int num3) {
if (num1 == num2 || num2 == num3) return false;
var maxNum = max(num1, max(num2, num3));
var minNum = min(num1, min(num2, num3));
return (maxNum - minNum == 2) || (minNum == 1 && maxNum == 13 && num1 + num2 + num3 == 26);
}

How to make my go parser code more readable

I'm writing a recursive descent parser in Go for a simple made-up language, so I'm designing the grammar as I go. My parser works but I wanted to ask if there are any best practices for how I should lay out my code or when I should put code in its own function etc ... to make it more readable.
I've been building the parser by following the simple rules I've learned so far ie. each non-terminal is it's own function, even though my code works I think looks really messy and unreadable.
I've included the code for the assignment non-terminal and the grammar above the function.
I've taken out most of the error handling to keep the function smaller.
Here's some examples of what that code can parse:
a = 10
a,b,c = 1,2,3
a int = 100
a,b string = "hello", "world"
Can anyone give me some advice as to how I can make my code more readable please?
// assignment : variable_list '=' expr_list
// | variable_list type
// | variable_list type '=' expr_list
func (p *Parser) assignment() ast.Noder {
assignment := &ast.AssignmentNode{}
assignment.Left = p.variable_list()
// This if-statement deals with rule 2 or 3
if p.currentToken.Type != token.ASSIGN {
// Static variable declaration
// Could be a declaration or an assignment
// Only static variables can be declared without providing a value
assignment.IsStatic = true
assignment.Type = p.var_type().Value
assignment.Right = nil
p.nextToken()
// Rule 2 is finished at this point in the code
// This if-statement is for rule 3
if p.currentToken.Type == token.ASSIGN {
assignment.Operator = p.currentToken
p.nextToken()
assignment.Right = p.expr_list()
}
} else {
// This deals with rule 1
assignment.Operator = p.currentToken
p.nextToken()
assignment.Right = p.expr_list()
}
if assignment.Right == nil {
for i := 0; i < len(assignment.Left); i++ {
assignment.Right = append(assignment.Right, nil)
}
}
if len(assignment.Left) != len(assignment.Right) {
p.FoundError(p.syntaxError("variable mismatch, " + strconv.Itoa(len(assignment.Left)) + " on left but " + strconv.Itoa(len(assignment.Right)) + " on right,"))
}
return assignment
}
how I can make my code more readable?
For readability, a prerequisite for correct, maintainable code,
// assignment : variable_list '=' expr_list
// | variable_list type
// | variable_list type '=' expr_list
func (p *Parser) assignment() ast.Noder {
assignment := &ast.AssignmentNode{}
// variable_list
assignment.Left = p.variable_list()
// type
if p.currentToken.Type != token.ASSIGN {
// Static variable declaration
// Could be a declaration or an assignment
// Only static variables can be declared without providing a value
assignment.IsStatic = true
assignment.Type = p.var_type().Value
p.nextToken()
}
// '=' expr_list
if p.currentToken.Type == token.ASSIGN {
assignment.Operator = p.currentToken
p.nextToken()
assignment.Right = p.expr_list()
}
// variable_list [expr_list]
if assignment.Right == nil {
for i := 0; i < len(assignment.Left); i++ {
assignment.Right = append(assignment.Right, nil)
}
}
if len(assignment.Left) != len(assignment.Right) {
p.FoundError(p.syntaxError(fmt.Sprintf(
"variable mismatch, %d on left but %d on right,",
len(assignment.Left), len(assignment.Right),
)))
}
return assignment
}
Note: This likely inefficient and overly complicated:
for i := 0; i < len(assignment.Left); i++ {
assignment.Right = append(assignment.Right, nil)
}
What is the type of assignment.Right?
As far as how to make your code more readable, there is not always a cut and dry answer. I personally find that code is more readable when you can use function names in place of comments in the code. A lot of people like to recommend the book "Clean Code" by Robert C. Martin. He pushes this throughout the book, small functions that have one purpose and are self documenting (via the function name).
Of course, as I said before this is a subjective topic. I took a crack at it, and came up with the code below, which I personally feel is more readable. It also uses the function names to document what is going on. That way the reader doesn't necessarily need to dig into every single statement in the code, but rather just the high level function names if they don't need all of the details.
// assignment : variable_list '=' expr_list
// | variable_list type
// | variable_list type '=' expr_list
func (p *Parser) assignment() ast.Noder {
assignment := &ast.AssignmentNode{}
assignment.Left = p.variable_list()
// This if-statement deals with rule 2 or 3
if p.currentToken.Type != token.ASSIGN {
// Static variable declaration
// Could be a declaration or an assignment
// Only static variables can be declared without providing a value
p.parseStaticStatement(assignment)
} else {
p.parseVariableAssignment(assignment)
}
if assignment.Right == nil {
assignment.appendDefaultValues()
}
p.checkForUnbalancedAssignment(assignment)
return assignment
}
func (p *Parser) parseStaticStatement(assignment *ast.AssingmentNode) {
assignment.IsStatic = true
assignment.Type = p.var_type().Value
assignment.Right = nil
p.nextToken()
// Rule 2 is finished at this point in the code
// This if-statement is for rule 3
if p.currentToken.Type == token.ASSIGN {
a.parseStaticAssignment()
}
}
func (p *Parser) parseStaticAssignment(assignment *ast.AssignmentNode) {
assignment.Operator = p.currentToken
p.nextToken()
assignment.Right = p.expr_list()
}
func (p *Parser) parseVariableAssignment(assignment *ast.AssignmentNode) {
// This deals with rule 1
assignment.Operator = p.currentToken
p.nextToken()
assignment.Right = p.expr_list()
}
func (a *ast.AssignmentNode) appendDefaultValues() {
for i := 0; i < len(assignment.Left); i++ {
assignment.Right = append(assignment.Right, nil)
}
}
func (p *Parser) checkForUnbalancedAssignment(assignment *ast.AssignmentNode) {
if len(assignment.Left) != len(assignment.Right) {
p.FoundError(p.syntaxError("variable mismatch, " + strconv.Itoa(len(assignment.Left)) + " on left but " + strconv.Itoa(len(assignment.Right)) + " on right,"))
}
}
I hope that you find this helpful. I am more than willing to answer any further questions that you may have if you leave a comment on my response.

Mask sensitive url query params

Say I have this url
https://example.com:8080?private-token=foo&authenticity_token=bar
And I have a function to determine whether to mask a param.
How can I mask the url, but maintaining the order of params.
Currently I have
u, err := url.Parse(originalURL)
if err != nil {
panic(err)
}
m, _ := url.ParseQuery(u.RawQuery)
for key := range m {
if toMask(key) {
m.Set(key, "FILTERED")
}
}
u.RawQuery = m.Encode()
return u.String()
But this would return url with the params being switched around.
https://example.com:8080?authenticity_token=FILTERED&private-token=FILTERED
First, the order of the params should not be of any importance.
But I can see some situation where this rule does not apply (eg when you hash an URL). In this case, you should normalize the URL before using it.
Finally to respond to your question, you cannot keep the order if using Query, as Values is a map, and map don't bother with ordering. You should thus work on the query using u.RawQuery.
u, err := url.Parse(originalURL)
if err != nil {
panic(err)
}
newQuery := ""
for i, queryPart := range strings.Split(u.RawQuery, ";") {
// you now have a slice of string ["private-token=foo", "authenticity_token=bar"]
splitParam := strings.Split(queryPart, "=")
if toMask(splitParam[0]) {
splitParam[1] = "FILTERED"
}
if i != 0 {
newQuery = newQuery + ";"
}
newQuery = splitParam[0] + "=" + splitParam[1]
}
u.RawQuery = newQuery
return u.String()
This code is just example. You have to better check for special cases or errors. You can also use regexp if you want to.

Accessing to a comment within a function in Go

I'm currently working on documentation generator that will parse Go code to produce a custom documentation. I need to access to the comments written inside a function. But I can't retrieve these comments in the AST, neither with the go/doc. Here is an example :
package main
import (
"fmt"
"go/doc"
"go/parser"
"go/token"
)
// GetFoo comments I can find easely
func GetFoo() {
// Comment I would like to access
test := 1
fmt.Println(test)
}
func main() {
fset := token.NewFileSet() // positions are relative to fset
d, err := parser.ParseDir(fset, "./", nil, parser.ParseComments)
if err != nil {
fmt.Println(err)
return
}
for k, f := range d {
fmt.Println("package", k)
p := doc.New(f, "./", 2)
for _, t := range p.Types {
fmt.Println("type", t.Name)
fmt.Println("docs:", t.Doc)
}
for _, v := range p.Vars {
fmt.Println("type", v.Names)
fmt.Println("docs:", v.Doc)
}
for _, f := range p.Funcs {
fmt.Println("type", f.Name)
fmt.Println("docs:", f.Doc)
}
for _, n := range p.Notes {
fmt.Println("body", n[0].Body)
}
}
}
It's easy to find GetFoo comments I can find easely but not Comment I would like to access
I have seen this post quite similar question Go parser not detecting Doc comments on struct type but for exported types
Is there any way to do that ? Thank you !
The problem is that the doc.New functionality is only parsing for documentation strings, and the comment inside the function is not part of the "documentation".
You'll want to directly iterate the ast of the files in the package.
package main
import (
"fmt"
"go/parser"
"go/token"
)
// GetFoo comments I can find easely
func GetFoo() {
// Comment I would like to access
test := 1
fmt.Println(test)
}
func main() {
fset := token.NewFileSet() // positions are relative to fset
d, err := parser.ParseDir(fset, "./", nil, parser.ParseComments)
if err != nil {
fmt.Println(err)
return
}
for k, f := range d {
fmt.Println("package", k)
for n, f := range f.Files {
fmt.Printf("File name: %q\n", n)
for i, c := range f.Comments {
fmt.Printf("Comment Group %d\n", i)
for i2, c1 := range c.List {
fmt.Printf("Comment %d: Position: %d, Text: %q\n", i2, c1.Slash, c1.Text)
}
}
}
}
}

Resources