I am trying to parse a sentence and, at the same time, convert numbers to their digit representation.
As a simple example I want the sentence
three apples
parsed and converted to
3 apples
With this code simple code I can actually parse the sentence correctly and convert three in 3, but when I try to flatten the result, 3 is reverted back to three.
Parser three() => string('three').trim().map((value) => '3');
Parser apples() => string('apples').trim();
Parser sentence = three() & apples();
// this produces Success[1:13]: [3, apples]
print(sentence.parse('three apples'));
// this produces Success[1:13]: three apples
print(sentence.flatten().parse('three apples'));
Am I missing something? Is flatten behavior correct?
Thanks in advance
L
Yes, this is the documented behavior of flatten: It discards the result of the parser, and returns a sub-string of the consumed range in the input being read.
From the question it is not clear what you expect instead?
You might want to use the token parser: sentence.token().parse('three apples'). This will yield a Token object that contains both, the parsed list [3, 'apples'] through Token.value and the consumed input string 'three apples' through Token.input.
Alternatively, you might want to transform the parsed list using a custom mapping function: sentence.map((list) => '${list[0]} ${list[1]}') yields '3 apples'.
Related
Curious about the syntax used in this example (https://learn.microsoft.com/en-us/dotnet/fsharp/get-started/get-started-command-line) within the file Library.js
My question, is the getJson function returning multiple values without a tuple?
Any link to F# documentation that explains this syntax would be nice. thanks.
open System.Text.Json
let getJson value =
let json = JsonSerializer.Serialize(value)
value, json
My question, is the getJson function returning multiple values without a tuple?
Yes to the first part, no to the second. The comma on the last line makes these two values a tuple.
You may think from online examples that a tuple is like (1, 2), but it’s just as fine to remove the parentheses if the expression is only on one line. In this case, value, json is the tuple.
Parentheses are used to disambiguate the order of evaluation. For instance, 1, “two”, “three” is a three-tuple of an int and two strings, but 1, (“two”, “three”) is a two-tuple of an int and the 2nd type being another two-tuple of two strings.
The Microsoft Learning link appears to always use parentheses in the examples. This post goes a little further, and has a bit more to say on tuple deconstruction as well: https://fsharpforfunandprofit.com/posts/tuples/.
Here’s more on parentheses (thanks Brent!): if it has a comma, it’s a tuple.
Let's define deparse1 as inverse operation to q's native parse, so that the following holds:
q)aStringQExpression~deparse parse aStringQExpression
1b
Question
What's the definition of deparse function so that the above indeed works?
For example, in the below update expression, we know that "a*0^b+c-d" expression corresponds to (*;`a;(^;0;(+;`b;(-;`c;`d)))) parse tree:
q)-3!parse "update col:a*0^b+c-d from t"
"(!;`t;();0b;(,`col)!,(*;`a;(^;0;(+;`b;(-;`c;`d)))))"
So the envisaged deparse function should return:
q)deparse "(*;`a;(^;0;(+;`b;(-;`c;`d))))"
"a*0^b+c-d"
q)deparse "(!;`t;();0b;(,`col)!,(*;`a;(^;0;(+;`b;(-;`c;`d)))))"
"update col:a*0^b+c-d from t"
Motivation/Background/Use case:
Inline expressions are arguably faster to grok by human eye (left-to-right) than deeply nested parse trees. Although in the background my code is editing the parse tree programatically, it is useful for debugging or presentation to conveniently convert that resulting parse tree into inline expression string.
1 Similar functionality described here: http://adv-r.had.co.nz/Expressions.html#parsing-and-deparsing
This unparse repository from Github solves the problem. Amazing:
q).unparse.unparse parse "update col:a*0^b+c-d from t"
"update col:(a*(0^(b+(c-d)))) from t"
q).unparse.unparse parse "a*0^b+c-d"
"(a*(0^(b+(c-d))))"
I think the only way to do this would be to parse the list recursively and build up a string, e.g. for a dyadic:
q)deparse:{a:-3!'x;a[1],a[0],a[2]}
q)deparse parse "3*3"
"3*3"
So you can count last x to get it valency and build the string accordingly
I am sending an array ("[1,3,44,2,0]") via an Ajax PATCH call, and it arrives as:
Parameters: {"ids"=>"[1,3,44,2,0]"}
To taint check, I am using the following line - in which the match anchors against the start and end of the string, and makes sure that there is at least one digit, or that the numbers are comma separated:
raise "unexpected ids #{params[:ids]}" unless params[:ids].match(/\A\[(\d+,)*\d+\]\z/)
And to make an actual integer array out of it, I am using the following approach (strip the brackets, split on comma, convert each string element to an integer):
irb> "[1,3,44,2,0]"[1...-1].split(',').map {|e| e.to_i}
=> [1, 3, 44, 2, 0]
Is there a better (simpler, cheaper, faster) way of doing this?
Try
JSON.parse(params[:ids])
But I think you should check your Ajax call. It must be possible to pass the array not as a string.
Background
On the json.org website, a string is defined as "char+" where char+ is one or more char. A char is any unicode character except " or \. A subset of control characters are allowed, just escape them:
"foo" "2" "\\"
In Javascript, if you want to parse a string, it needs to be enclosed:
"\"foo\"" or '"foo"', but not "'foo'"
In Rails 3, the JSON gem that runs C or pure Ruby code is default.
As per the accepted answer, the gem parses JSON documents rather than elements. A document is either a collection in the form of key, value (object/hash) or values (array).
The problem
Strings
Let's say we want to parse the string foo, We would need to enclose it as "\"foo\"" or '"foo"'
JSON.parse("\"foo\"")
JSON.parse('"foo"')
yield
JSON::ParserError unexpected token at '"foo"'
meaning it can't parse "foo"
Numbers
The same goes for numbers: '3' or "3" will yield Needs at least two octets.
Larger numbers ( an octet is a byte, so two utf8 characters are two bytes ): '42' or "42" simply yield the same JSON::ParserError unexpected token at '42'
The workaround
The gem correctly parses these things if they are in an array: '["foo"]' or '[3]'
jsonstring = '"foo"'
JSON.parse( "[#{jsonstring}]" )[0]
yields
"foo"
This is ridiculous. Am I not understanding something correctly? Or is this gem bugged?
json.org states:
JSON is built on two structures:
A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.
An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.
Since "foo" is neither of the above you are getting the error message. To further concrete it have a look at Ruby JSON Parser, the documenation states:
To create a valid JSON document you have to make sure, that the output is embedded in either a JSON array [] or a JSON object {}. The easiest way to do this, is by putting your values in a Ruby Array or Hash instance.
Hence what you are mentioning as "workaround" is actually the correct way to parse a string using the JSON parser.
Additional Info:
Parsing "\"foo\"" on jsonlint.com and json.parser.online.fr raises an error,
parsing ["foo"] passes the validation test on both the sites.
Suppose I want a function that takes a number and returns it as a string, exactly as it was given. The following doesn't work:
SetAttributes[foo, HoldAllComplete];
foo[x_] := ToString[Unevaluated#x]
The output for foo[.2] and foo[.20] is identical.
The reason I want to do this is that I want a function that can understand dates with dots as delimiters, eg, f[2009.10.20]. I realize that's a bizarre abuse of Mathematica but I'm making a domain-specific language and want to use Mathematica as the parser for it by just doing an eval (ToExpression). I can actually make this work if I can rely on double-digit days and months, like 2009.01.02 but I want to also allow 2009.1.2 and that ends up boiling down to the above question.
I suspect the only answer is to pass the thing in as a string and then parse it, but perhaps there's some trick I don't know. Note that this is related to this question: Mathematica: Unevaluated vs Defer vs Hold vs HoldForm vs HoldAllComplete vs etc etc
I wouldn't rely on Mathematica's float-parsing. Instead I'd define rules on MakeExpression for foo. This allows you to intercept the input, as boxes, prior to it being parsed into floats. This pair of rules should be a good starting place, at least for StandardForm:
MakeExpression[RowBox[{"foo", "[", dateString_, "]"}], StandardForm] :=
With[{args = Sequence ## Riffle[StringSplit[dateString, "."], ","]},
MakeExpression[RowBox[{"foo", "[", "{", args, "}", "]"}], StandardForm]]
MakeExpression[RowBox[{"foo", "[", RowBox[{yearMonth_, day_}], "]"}],
StandardForm] :=
With[{args =
Sequence ## Riffle[Append[StringSplit[yearMonth, "."], day], ","]},
MakeExpression[RowBox[{"foo", "[", "{", args, "}", "]"}], StandardForm]]
I needed the second rule because the notebook interface will "helpfully" insert a space if you try to put a second decimal place in a number.
EDIT: In order to use this from the kernel, you'll need to use a front end, but that's often pretty easy in version 7. If you can get your expression as a string, use UsingFrontEnd in conjunction with ToExpression:
UsingFrontEnd[ToExpression["foo[2009.09.20]", StandardForm]
EDIT 2: There's a lot of possibilities if you want to play with $PreRead, which allows you to apply special processing to the input, as strings, before they're parsed.
$PreRead = If[$FrontEnd =!= Null, #1,
StringReplace[#,x:NumberString /; StringMatchQ[x,"*.*0"] :>
StringJoin[x, "`", ToString[
StringLength[StringReplace[x, "-" -> ""]] -
Switch[StringTake[StringReplace[x,
"-" -> ""], 1], "0", 2, ".", 1, _,
1]]]]] & ;
will display foo[.20] as foo[0.20]. The InputForm of it will be
foo[0.2`2.]
I find parsing and displaying number formats in Mathematica more difficult than
it should be...
Floats are, IIRC, parsed by Mathematica into actual Floats, so there's no real way to do what you want.