I'm trying to parse the following alternate strings with nom5.0
"A-Za-z0-9"
or
"A-Z|a-z|0-9"
I've tried the following but to no avail
pub enum Node {
Range(Vec<u8>),
}
fn compound_range(input: &[u8]) -> IResult<&[u8], Node> {
map(
separated_list(
alt((tag("|"), tag(""))),
tuple((take(1usize), tag("-"), take(1usize))),
),
|v: Vec<(&[u8], _, &[u8])>| {
Node::Range(
v.iter()
.map(|(s, _, e)| (s[0]..=e[0]).collect::<Vec<_>>())
.flatten()
.collect(),
)
},
)(input)
}
Version 2.
fn compound_range(input: &[u8]) -> IResult<&[u8], Node> {
alt((
map(
separated_list(tag("|"), tuple((take(1usize), tag("-"), take(1usize)))),
|v: Vec<(&[u8], _, &[u8])>| {
Node::Range(
v.iter()
.map(|(s, _, e)| (s[0]..=e[0]).collect::<Vec<_>>())
.flatten()
.collect(),
)
},
),
map(
many1(tuple((take(1usize), tag("-"), take(1usize)))),
|v: Vec<(&[u8], _, &[u8])>| {
Node::Range(
v.iter()
.map(|(s, _, e)| (s[0]..=e[0]).collect::<Vec<_>>())
.flatten()
.collect(),
)
},
),
))(input)
}
#[test]
fn parse_compound() {
println!("{:?}", compound_range(b"A-Za-z0-9"));
println!("{:?}", compound_range(b"A-Z|a-z|0-9"));
}
I can either get the first or the second one to parse but never both.
Is there a way to express this?
The problem is that nom always takes the first path it sees somewhat works (as in, it doesn't have to consume all input). So what you ideally want to do, is split the paths after the first "a-z" (or whatever), to one of two possible ones: You deal with | as a separator, or not.
This is because nom is a parser combinator library, and doesn't work like regex which can backtrack as far as it needs to to find something that works.
Anyway, something like that should work:
fn compound_range(input: &[u8]) -> IResult<&[u8], Node> {
let single_range = |input| map(
separated_pair(take(1usize), tag("-"), take(1usize)),
|(l, r): (&[u8], &[u8])| (l[0], r[0])
)(input);
map(
opt(
map(
pair(
single_range,
alt((
preceded(tag("|"), separated_nonempty_list(
tag("|"),
single_range,
)),
many0(single_range)
))
),
|(first, rest)| Node::Range(
std::iter::once(first).chain(rest).flat_map(|(l, r)| l..r).collect()
)
),
),
|o| o.unwrap_or_else(|| Node::Range(Vec::new()))
)(input)
}
Is there a better way? Probably. Given the specific task, it might actually make sense to implement that part of the parser you're writing manually. Does it work this way though? Probably. (I haven't tested it)
Also something to keep in mind: This might consume too much, if you expect some other stuff that fits the pattern after it.
Related
I'm trying to draw some boxes in Rascal and trying to give each box its own callback function. On entering the box with the mouse the corresponding string should get displayed in the text element (so hovering box1 should display box1 etc.).
However, at the moment the text does pop up but just displays "box3" for each of the 3 boxes.
Any ideas?
strings = ["box1", "box2", "box3"];
boxes = [ box(
size(100, 100),
onMouseEnter(void() {
output = s;
})
) | s <- strings];
render(hcat([
vcat(boxes),
text(str () {return output;})
]));
Good question, classical problem. The essence of the problem is that Rascal uses "non-capturing closures": this means that functions that are returned from another function share the same context. In your case this is the variable s introduced by s <- strings. This nearly always happens when you create function values in a loop (as you do here). The solution is to wrap another function layer around the returned function.
Here is a simple example:
list[int()] makeClosures()
= [ int() {return i;} | i <- [0,1,2]];
void wrong(){
lst = makeClosures();
println(lst[0]());
println(lst[1]());
println(lst[2]());
}
which will print surprisingly the values 2,2and2`. The solution is, as said, to introduce another function level:
int() makeClosure(int i)
= int() { return i;};
list[int()] makeClosuresOK()
= [ makeClosure(i) | i <- [0,1,2]];
void right(){
lst = makeClosuresOK();
println(lst[0]());
println(lst[1]());
println(lst[2]());
}
now calling right() will print 1, 2, and 3 as expected.
I leave it as an exercise how this is done in your example, but I am prepared to give a solution when you ask for it. Good luck!
Trying to do something similar to this question except allow underscores from the second character onwards. Not just camel case.
I can test the parser in isolation successfully but when composed in a higher level parser, I get errors
Take the following example:
#![allow(dead_code)]
#[macro_use]
extern crate nom;
use nom::*;
type Bytes<'a> = &'a [u8];
#[derive(Clone,PartialEq,Debug)]
pub enum Statement {
IF,
ELSE,
ASSIGN((String)),
IDENTIFIER(String),
EXPRESSION,
}
fn lowerchar(input: Bytes) -> IResult<Bytes, char>{
if input.is_empty() {
IResult::Incomplete(Needed::Size(1))
} else if (input[0] as char)>='a' && 'z'>=(input[0] as char) {
IResult::Done(&input[1..], input[0] as char)
} else {
IResult::Error(error_code!(ErrorKind::Custom(1)))
}
}
named!(identifier<Bytes,Statement>, map_res!(
recognize!(do_parse!(
lowerchar >>
//alt_complete! is not having the effect it's supposed to so the alternatives need to be ordered from shortest to longest
many0!(alt!(
complete!(is_a!("_"))
| complete!(take_while!(nom::is_alphanumeric))
)) >> ()
)),
|id: Bytes| {
//println!("{:?}",std::str::from_utf8(id).unwrap().to_string());
Result::Ok::<Statement, &str>(
Statement::IDENTIFIER(std::str::from_utf8(id).unwrap().to_string())
)
}
));
named!(expression<Bytes,Statement>, alt_complete!(
identifier //=> { |e: Statement| e }
//| assign_expr //=> { |e: Statement| e }
| if_expr //=> { |e: Statement| e }
));
named!(if_expr<Bytes,Statement>, do_parse!(
if_statement: preceded!(
tag!("if"),
delimited!(tag!("("),expression,tag!(")"))
) >>
//if_statement: delimited!(tag!("("),tag!("hello"),tag!(")")) >>
if_expr: expression >>
//else_statement: opt_res!(tag!("else")) >>
(Statement::IF)
));
#[cfg(test)]
mod tests {
use super::*;
use IResult::*;
//use IResult::Done;
#[test]
fn ident_token() {
assert_eq!(identifier(b"iden___ifiers"), Done::<Bytes, Statement>(b"" , Statement::IDENTIFIER("iden___ifiers".to_string())));
assert_eq!(identifier(b"iden_iFIErs"), Done::<Bytes, Statement>(b"" , Statement::IDENTIFIER("iden_iFIErs".to_string())));
assert_eq!(identifier(b"Iden_iFIErs"), Error(ErrorKind::Custom(1))); // Supposed to fail since not valid
assert_eq!(identifier(b"_den_iFIErs"), Error(ErrorKind::Custom(1))); // Supposed to fail since not valid
}
#[test]
fn if_token() {
assert_eq!(if_expr(b"if(a)a"), Error(ErrorKind::Alt)); // Should have passed
assert_eq!(if_expr(b"if(hello)asdas"), Error(ErrorKind::Alt)); // Should have passed
}
#[test]
fn expr_parser() {
assert_eq!(expression(b"iden___ifiers"), Done::<Bytes, Statement>(b"" , Statement::IDENTIFIER("iden___ifiers".to_string())));
assert_eq!(expression(b"if(hello)asdas"), Error(ErrorKind::Alt)); // Should have been able to recognise an IF statement via expression parser
}
}
I have a string like A=B&C=D&E=F, how to parse it into map in golang?
Here is example on Java, but I don't understand this split part
String text = "A=B&C=D&E=F";
Map<String, String> map = new LinkedHashMap<String, String>();
for(String keyValue : text.split(" *& *")) {
String[] pairs = keyValue.split(" *= *", 2);
map.put(pairs[0], pairs.length == 1 ? "" : pairs[1]);
}
Maybe what you really want is to parse an HTTP query string, and url.ParseQuery does that. (What it returns is, more precisely, a url.Values storing a []string for every key, since URLs sometimes have more than one value per key.) It does things like parse HTML escapes (%0A, etc.) that just splitting doesn't. You can find its implementation if you search in the source of url.go.
However, if you do really want to just split on & and = like that Java code did, there are Go analogues for all of the concepts and tools there:
map[string]string is Go's analog of Map<String, String>
strings.Split can split on & for you. SplitN limits the number of pieces split into like the two-argument version of split() in Java does. Note that there might only be one piece so you should check len(pieces) before trying to access pieces[1] say.
for _, piece := range pieces will iterate the pieces you split.
The Java code seems to rely on regexes to trim spaces. Go's Split doesn't use them, but strings.TrimSpace does something like what you want (specifically, strips all sorts of Unicode whitespace from both sides).
I'm leaving the actual implementation to you, but perhaps these pointers can get you started.
import ( "strings" )
var m map[string]string
var ss []string
s := "A=B&C=D&E=F"
ss = strings.Split(s, "&")
m = make(map[string]string)
for _, pair := range ss {
z := strings.Split(pair, "=")
m[z[0]] = z[1]
}
This will do what you want.
There is a very simple way provided by golang net/url package itself.
Change your string to make it a url with query params text := "method://abc.xyz/A=B&C=D&E=F";
Now just pass this string to Parse function provided by net/url.
import (
netURL "net/url"
)
u, err := netURL.Parse(textURL)
if err != nil {
log.Fatal(err)
}
Now u.Query() will return you a map containing your query params. This will also work for complex types.
Here is a demonstration of a couple of methods:
package main
import (
"fmt"
"net/url"
)
func main() {
{
q, e := url.ParseQuery("west=left&east=right")
if e != nil {
panic(e)
}
fmt.Println(q) // map[east:[right] west:[left]]
}
{
u := url.URL{RawQuery: "west=left&east=right"}
q := u.Query()
fmt.Println(q) // map[east:[right] west:[left]]
}
}
https://golang.org/pkg/net/url#ParseQuery
https://golang.org/pkg/net/url#URL.Query
Result example:
{collisions=0, rx_bytes=258, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=3, tx_bytes=648, tx_dropped=0, tx_errors=0, tx_packets=8}
This format is like JSON, but not JSON.
Is there an easy way to parse this into map[string]int? Like json.Unmarshal(data, &value).
If that transport format is not recursively defined, i.e. a key cannot start a sub-structure, then its language is regular. As such, you can soundly parse it with Go's standard regexp package:
Playground link.
package main
import (
"fmt"
"regexp"
"strconv"
)
const data = `{collisions=0, rx_bytes=258, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=3, tx_bytes=648, tx_dropped=0, tx_errors=0, tx_packets=8}`
const regex = `([a-z_]+)=([0-9]+)`
func main() {
ms := regexp.MustCompile(regex).FindAllStringSubmatch(data, -1)
vs := make(map[string]int)
for _, m := range ms {
v, _ := strconv.Atoi(m[2])
vs[m[1]] = v
}
fmt.Printf("%#v\n", vs)
}
Regexp by #thwd is an elegant solution.
You can get a more efficient solution by using strings.Split() to split by comma-space (", ") to get the pairs, and then split again by the equal sign ("=") to get the key-value pairs. After that you just need to put these into a map:
func Parse(s string) (m map[string]int, err error) {
if len(s) < 2 || s[0] != '{' || s[len(s)-1] != '}' {
return nil, fmt.Errorf("Invalid input, no wrapping brackets!")
}
m = make(map[string]int)
for _, v := range strings.Split(s[1:len(s)-1], ", ") {
parts := strings.Split(v, "=")
if len(parts) != 2 {
return nil, fmt.Errorf("Equal sign not found in: %s", v)
}
if m[parts[0]], err = strconv.Atoi(parts[1]); err != nil {
return nil, err
}
}
return
}
Using it:
s := "{collisions=0, rx_bytes=258, ...}"
fmt.Println(Parse(s))
Try it on the Go Playground.
Note: If performance is important, this can be improved by not using strings.Split() in the outer loop, but instead searching for the comma "manually" and maintaining indices, and only "take out" substrings that represent actual keys and values (but this solution would be more complex).
Another solution...
...but this option is much slower so it is only viable if performance is not a key requirement: you can turn your input string into a valid JSON format and after that you can use json.Unmarshal(). Error checks omitted:
s := "{collisions=0, rx_bytes=258, ...}"
// Turn into valid JSON:
s = strings.Replace(s, `=`, `":`, -1)
s = strings.Replace(s, `, `, `, "`, -1)
s = strings.Replace(s, `{`, `{"`, -1)
// And now simply unmarshal:
m := make(map[string]int)
json.Unmarshal([]byte(s), &m)
fmt.Println(m)
Advantage of this solution is that this also works if the destination value you unmarhsal into is a struct:
// Unmarshal into a struct (you don't have to care about all fields)
st := struct {
Collisions int `json:"collisions"`
Rx_bytes int `json:"rx_bytes"`
}{}
json.Unmarshal([]byte(s), &st)
fmt.Printf("%+v\n", st)
Try these on the Go Playground.
I want to use the Rust parser (libsyntax) to parse a Rust file and extract information like function names out of it. I started digging in the docs and code, so my first goal is a program that prints all function names of freestanding functions in a .rs file.
The program should expand all macros before it prints the function names, so functions declared via macro aren't missed. That's why I can't write some crappy little parser by myself to do the job.
I have to admit that I'm not yet perfectly good at programming Rust, so I apologize in advance for any stupid statements in this question.
How I understood it I need to do the following steps:
Parse the file via the Parser struct
Expand macros with MacroExpander
???
Use a Visitor to walk the AST and extract the information I need (eg. via visit_fn)
So here are my questions:
How do I use MacroExpander?
How do I walk the expanded AST with a custom visitor?
I had the idea of using a custom lint check instead of a fully fledged parser. I'm investigating this option.
If it matters, I'm using rustc 0.13.0-nightly (f168c12c5 2014-10-25 20:57:10 +0000)
You can use syntex to parse Rust, so you don't need to use unstable Rust.
Here's a simple example:
// Tested against syntex_syntax v0.33
extern crate syntex_syntax as syntax;
use std::rc::Rc;
use syntax::codemap::{CodeMap};
use syntax::errors::{Handler};
use syntax::errors::emitter::{ColorConfig};
use syntax::parse::{self, ParseSess};
fn main() {
let codemap = Rc::new(CodeMap::new());
let tty_handler =
Handler::with_tty_emitter(ColorConfig::Auto, None, true, false, codemap.clone());
let parse_session = ParseSess::with_span_handler(tty_handler, codemap.clone());
let src = "fn foo(x: i64) { let y = x + 1; return y; }".to_owned();
let result = parse::parse_crate_from_source_str(String::new(), src, Vec::new(), &parse_session);
println!("parse result: {:?}", result);
}
This prints the whole AST:
parse result: Ok(Crate { module: Mod { inner: Span { lo: BytePos(0), hi: BytePos(43), expn_id: ExpnId(4294967295) },
items: [Item { ident: foo#0, attrs: [], id: 4294967295, node: Fn(FnDecl { inputs: [Arg { ty: type(i64), pat:
pat(4294967295: x), id: 4294967295 }], output: Default(Span { lo: BytePos(15), hi: BytePos(15), expn_id: ExpnId(4294967295) }),
variadic: false }, Normal, NotConst, Rust, Generics { lifetimes: [], ty_params: [], where_clause: WhereClause { id:
4294967295, predicates: [] } }, Block { stmts: [stmt(4294967295: let y = x + 1;), stmt(4294967295: return y;)], expr:
None, id: 4294967295, rules: Default, span: Span { lo: BytePos(15), hi: BytePos(43), expn_id: ExpnId(4294967295) } }),
vis: Inherited, span: Span { lo: BytePos(0), hi: BytePos(43), expn_id: ExpnId(4294967295) } }] }, attrs: [], config: [],
span: Span { lo: BytePos(0), hi: BytePos(42), expn_id: ExpnId(4294967295) }, exported_macros: [] })
I'm afraid I can't answer your question directly; but I can present an alternative that might help.
If all you need is the AST, you can retrieve it in JSON format using rustc -Z ast-json. Then use your favorite language (Python is great) to process the output.
You can also get pretty-printed source using rustc --pretty=(expanded|normal|typed).
For example, given this hello.rs:
fn main() {
println!("hello world");
}
We get:
$ rustc -Z ast-json hello.rs
{"module":{"inner":null,"view_items":[{"node":{"va... (etc.)
$ rustc --pretty=normal hello.rs
#![no_std]
#[macro_use]
extern crate "std" as std;
#[prelude_import]
use std::prelude::v1::*;
fn main() { println!("hello world"); }
$ rustc --pretty=expanded hello.rs
#![no_std]
#[macro_use]
extern crate "std" as std;
#[prelude_import]
use std::prelude::v1::*;
fn main() {
::std::io::stdio::println_args(::std::fmt::Arguments::new({
#[inline]
#[allow(dead_code)]
static __STATIC_FMTSTR:
&'static [&'static str]
=
&["hello world"];
__STATIC_FMTSTR
},
&match () {
() => [],
}));
}
If you need more than that though, a lint plugin would be the best option. Properly handling macro expansion, config flags, the module system, and anything else that comes up is quite non-trivial. With a lint plugin, you get the type-checked AST right away without fuss. Cargo supports compiler plugins too, so your tool will fit nicely into other people's projects.
The syn crate works indeed. At the beginning I wrongly think it is for writing procedural macros (as its readme suggests), but indeed it can parse a source code file. Please look at this page: https://docs.rs/syn/1.0.77/syn/struct.File.html . It even gives an example that inputs a .rs file and output AST (of course, you can do anything with it - not just printing):
use std::env;
use std::fs::File;
use std::io::Read;
use std::process;
fn main() {
let mut args = env::args();
let _ = args.next(); // executable name
let filename = match (args.next(), args.next()) {
(Some(filename), None) => filename,
_ => {
eprintln!("Usage: dump-syntax path/to/filename.rs");
process::exit(1);
}
};
let mut file = File::open(&filename).expect("Unable to open file");
let mut src = String::new();
file.read_to_string(&mut src).expect("Unable to read file");
let syntax = syn::parse_file(&src).expect("Unable to parse file");
// Debug impl is available if Syn is built with "extra-traits" feature.
println!("{:#?}", syntax);
}
Thanks #poolie for pointing out this hint (though lacking a bit of details).
syntex seems to no longer be maintained (last updated 2017), but https://crates.io/crates/syn may do what you need.